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ABSTRACT 


"i 

The  cyclomatic  complexity  metric  provides  a  means  of  quantifying  intra-modular 

software  complexity,  and  its  utility  has  been  suggested  in  the  software  development  and 
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testing  process.  In  this  study,  an  empirical  analysis  was  undertaken  to  examine  the 

relationship  between  the  cyclomatic  complexity  and  the  incidence  of  faults  in  a  series  of 

eight  relatively  large  (from  1200  to  2400  LOC)  complex  programs.  Each  of  these 

programs  was  developed  from  a  single  program  specification  and  subsequently  subjected 

to  rigorous  unit  level  testing.  A  comparison  was  also  made  between  the  relationship  of 

cyclomatic  complexity  to  faults  and  the  relationship  of  Lines  of  Code  (LOC)  to  faults. 

The  results  of  this  study  support  a  relationship  between  the  cyclomatic  complexity 

and  the  incidence  of  faults.  Further,  a  relationship  between  LOC  and  faults  is 

» 

demonstrated.  It  could  not  be  shown  that  there  exists  a  stronger  relationship  between 
cyclomatic  complexity  and  faults  than  LOC  and  faults. 
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I .  INTRODUCTION 


A.  IMPORTANCE  07  SOFTWARE  TESTING 

The  question  of  software  accuracy  and  reliability  has 
become  an  important  issue  facing  software  developers, 
especially  in  the  last  decade.  There  has  been  an  explosion  in 
the  demand  for  new  software  systems.  Software  is  being 
applied  to  all  facets  of  life,  from  relatively  commonplace  and 
unsophisticated  transaction  systems  to  unforgiving  and  time 
critical  military  and  space  applications. 

While  hardware  costs  have  gone  down  in  the  past  decade, 
software  costs  are  on  the  increase,  mainly  because  software  is 
being  applied  to  increasingly  complex  and  ever  larger  systems. 
Brooks  amusingly  likens  the  unforgiving  environment  of 
software  development  to  the  precise  and  unforgiving 
incantations  of  the  mythical  medieval  sorcerer.  As  with  the 
incantations,  the  program  must  be  constructed  with  absolute 
precision.  [Ref.  1]  Software  can  be  seen  as  an  almost 
magical  solution  to  a  multitude  of  problems,  but  software 
development,  with  all  the  benefits  of  structured  design 
methodologies  and  programming  tools,  is  still  an  activity  that 
is  highly  human  intensive  and  subject  to  all  the  vicissitudes 
and  imperfections  that  go  along  with  being  human . 
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The  consequences  of  even  small  software  mistakes  can  be  quite 

large.  For  example,  there  is  the  commonly  told  story  of  the 

spacecraft  "Pioneer"  that  was  unable  to  carry  out  its 

multimillion-dollar  mission  because  of  a  single  misplaced 

character.  As  costly  systems  become  software  dependent,  both 

in  terms  of  monetary  value  and  the  potential  cost  to  human 

life,  a  system  software  must  commensurately  become  more 

reliable  and  cost  efficient.  Software,  ideally,  should  be 

reliable  both  in  short  term  performance  and  in  its  ability  to 

accommodate  future  requirements  (maintenance) .  McCabe  states: 

"That  the  issues  of  testability  and  maintainability  are 
important  is  borne  out  by  the  fact  that  we  often  spend 
half  of  the  development  time  in  testing." 

[Ref.  2] 

In  the  same  vein  Boehm  states : 

"In  1985,  software  costs  totaled  roughly  $11  billion  in 
the  US  Department  of  Defense,  $70  billion  in  the  US 
overall,  and  $140  billion  worldwide.  If  present  software 
cost  growth  rates  of  approximately  12  percent  per  year 
continue,  the  1995  figures  will  be  $36  billion  for  the 
DoD,  $225  billion  for  the  US,  and  $450  billion  worldwide. 
Thus  even  a  20  percent  improvement  in  software 
productivity  would  be  worth  $45  billion  in  1995  for  the  US 
and  $90  billion  worldwide."  [Ref.  3] 

Thus  an  improvement  in  testing  that  results  in  a  20  percent 

gain  in  efficiency  would  be  worth  somewhere  between  28  and  90 

billion  dollars  in  terms  of  worldwide  savings  annually.  The 

ability  to  identify  regions  of  code  where  software  testing 
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effort  should  be  concentrated  could  save  developers  both  time 
and  money,  or  result  in  more  reliable  systems  for  the  same 
money . 

Software  testing,  like  other  phases  of  software 
development  such  as  design,  coding  and  debugging,  is  an 
activity  that  is  extremely  demanding  of  human  resources.  It 
is  not  the  sort  of  activity  that  can  be  greatly  improved  upon 
through  the  introduction  of  better  or  faster  hardware,  for 
example.  Furthermore,  complete  testing  is  not  theoretically 
possible.  Three  reasons  are  given  for  this  inability  to 
completely  test  a  program  in  Beizer's  "Software  System 
Testing  and  Quality  Assurance": 

"-We  can  never  be  sure  that  the  specifications  are 
correct . 

-No  verification  system  can  verify  every  correct 
program . 

-We  can  never  be  certain  that  a  verification  system  is 
correct."  [Ref-  4] 

Given  that  testing  is  an  activity  that  is  human  resource 
intensive  and  that  theoretically  a  program  can  never  be 
completely  tested,  then  it  is  reasonable  to  search  for  some 
tool  to  guide  testing  activities  to  minimize  commitment  of 
human  resources  and  money.  McCabe  suggests  the  cyclomat.ic 
complexity  as  just  such  a  tool.  [Ref.  2] 
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B.  APPROACH 


1.  Definition  of  Cyclomatic  Complexity 

The  cyclomatic  complexity  is  a  mathematical  technique 
that  provides  a  quantitative  basis  for  modularization  and  for 
developing  a  testing  strategy,  by  evaluating  source  code  in 
terms  of  logical  decision  points.  The  cyclomatic  complexity 
is  a  measure  of  the  number  of  paths  in  a  pi  :>gram.  One  problem 
with  measuring  the  possible  number  of  paths  is  that  where  a 
backwards  branch  exists,  there  is  the  possibility  of  an 
infinite  number  of  paths.  Using  the  total  number  of  paths  is 
not  a  realistic  approach.  Therefore  cyclomatic  complexity  is 
defined  as  the  number  of  basic  paths  through  a  program.  The 
cyclomatic  complexity  (v(G))  of  a  program  is  derived  by 
associating  a  directed  graph  with  it.  In  figure  1,  a  directed 
graph,  each  node  (a  -  e)  represents  a  block  of  code  where  the 
flow  is  sequential,  and  the  arcs  represent  branches  in  logical 
program  structure.  [Ref.  2] 
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The  McCabe  complexity  measure  is  based  on  the  application  of 

a  graph-theoretic  complexity  measure  to  a  program's  structure. 

"Definition  1:  The  cyclomatic  number  V(G)  of  a  graph  G 
with  n  vertices,  e  edges,  and  p  connected  components  is 


v(G) =e-n+p. 


Theorem  1 :  In  a  strongly  connected  graph  G,  the 
cyclomatic  number  is  equal  to  the  maximum  number  of 
linearly  independent  circuits."  [Ref.  2] 

Referring  to  figure  one,  an  example  of  a  linearly  independent 

circuit  in  G  is:  a-c-f-a,  where  the  starting  point  is  a  and 

the  ending  point  is  a.  McCabe  derives  a  simplified  form  of 

the  cyclomatic  complexity  equation.  In  this  simplified 

equation,  the  cyclomatic  complexity  (v(G))  equals  the  number 

of  predicates  (pi)  plus  one,  where  each  predicate  is  a  1  u.: 
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decision  point  such  as  "if-then"  (refer  to  chapter  3  for  a 
complete  list  of  Pascal  predicates  to  be  evaluated) : 

v{G)  =71  +  1, 

This  simplified  form  is  easily  automated  in  a  tool  for 
obtaining  the  cyclomatic  complexity  of  a  block  of  code.  The 
cyclomatic  complexity  measure  can  be  determined  for  each  of 
the  functions  and  procedures  of  a  given  program.  The  testing 
effort  on  the  procedures  and  functions  could,  therefore,  be 
concentrated  on  the  basis  of  higher  complexity  (v(G))  values. 
The  assumption  would  be  that  there  is  potentially  a  higher 
number  of  errors  in  these  more  complex  procedures  and 
functions.  McCabe  suggests  a  v(G)  value  of  ten,  as  an 
indicator  of  where  testing  effort  should  be  concentrated. 
[Ref.  2] 

2 .  Cyclomatic  Complexity  Measure  and  Lines  of  Code 

McCabe  goes  on  to  suggest  that  the  measure  of 
cyclomatic  complexity  is  of  greater  utility  as  an  indicator  of 
potentially  troublesome  regions  of  code  than  the  simple 
measure  of  lines  of  code  (LOC)  .  Shortness  does  not  always 
imply  simplicity.  A  short  module,  consisting  of  a  series  of 
twenty  five  if-then' s  (i.e.,  a  high  cyclomatic  complexity)  has 
a  large  number  of  possible  logical  paths. 
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The  cyclomatic  measure,  representing  the  inherent  complexity 
of  a  module,  would  be  a  more  reliable  predictor  of  potentially 
faulty  regions  of  code  than  the  simple  lines  of  code  (LOC) 
metric.  [Ref.  2] 

3 .  Use  of  An  Experiment  to  Evaluate  the  Cyclomatic 
Complexity 

In  light  of  the  theoretical  work  developed  by  McCabe, 
the  question  remains  as  to  whether  there  is  a  correlation 
between  the  complexity  metric  and  what  is  actually  born  out 
through  empirical  analysis.  There  is  also  the  question  of 
how  the  cyclomatic  complexity  metric  compares  with  other 
metrics . 

Few  experiments  have  examined  the  incidence  of  errors 
with  respect  to  the  cyclomatic  complexity  measure.  In  a  study 
by  Walsh,  the  occurrence  of  errors  was  measured  against 
cyclomatic  complexity  in  eight  functionally  related  modules 
(essentially  one  large  program) .  These  modules  were  comprised 
of  276  procedures  or  programs  making  up  a  software  control 
loop  and  display  processing  for  the  Aegis  Naval  Weapons 
System.  The  error  data  were  derived  through  the  compilation 
of  trouble  reports.  In  the  study,  Walsh  endeavored  to 
distinguish  between  "software  errors"  and  "design  errors"  by 
interviewing  programmers  responsible  for  the  code.  The  study 
divided  the  procedures  and  programs  into  two  groups,  one  group 
where  cyclomatic  complexity  was  less  than  ten  and  the  other 
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group  where  it  was  greater,  in  order  to  compare  the  incidence 
of  errors  in  the  two  groups.  It  was  found  that  there  was  a 
21%  higher  incidence  of  errors  in  the  group  with  a  higher 
cyclomatic  complexity.  [Ref.  5] 

In  a  study  by  Schneidewind,  four  programs  were 
examined  for  the  correlation  between  the  incidence  of  errors 
and  the  cyclomatic  complexity.  The  four  programs  were 
programmed  by  the  same  programmer  in  Algol  for  execution  on 
the  IBM  360/67.  Program  one  was  141  source  statements. 
Program  two  was  712  source  statements.  Program  three  was  70 
source  statements  and  program  four  was  1084  source  statements. 
It  can  be  seen  that  three  of  the  programs  were  fewer  than  1000 
source  statements;  one  was  just  barely  over  that  number.  Two 
of  the  programs  were  in  fact  quite  small  (one  and  three)  .  The 
error  data  for  these  experiments  were  obtained  through 
programmer-generated  error  reports.  The  experiment  concluded 
that  program  structure  (v(G))  would  have  a  significant  effect 
on  the  number  of  errors.  In  this  study  it  was  further  stated 
that  the  relationship  between  errors  and  structure  was  not 
expressible  as  a  mathematical  function  but  serves  to  partition 
structures  into  high  or  low  occurrence  depending  on  whether 
the  cyclomatic  complexity  is  high  or  low.  [Ref.  6] 

In  a  study  by  Basili,  a  database  of  ground  support 
software  for  satellites  was  studied  to  invest  i  a  a*:  e  t.  he 
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relationship  between  program  cyclomatic  complexity  and  errors. 
In  this  study  Basili  examined  systems  ranging  from  51,000  to 
112,000  lines  of  Fortran  source  code.  Each  system  was  made  up 
of  from  200  to  600  modules  (a  total  of  1794  modules  were 
analyzed  in  the  entire  experiment) .  Error  data  were  collected 
from  change  report  forms .  It  was  the  conclusion  of  this  study 
that  there  was  a  poor  correlation  of  cyclomatic  complexity 
with  errors  in  these  large  programs.  Basili  cites  a  Spearman 
rank  order  correlation  for  cyclomatic  complexity  with  faults 
of  .196.  The  low  correlations  were  attributed  to  the  fact 
that  340  of  the  652  modules  analyzed  with  regard  to  errors  had 
zero  reported  errors.  [Ref.  7] 

In  a  study  by  Ward,  two  large  scale  programs  were 
examined  for  correlation  of  errors  to  cyclomatic  complexity  at 
Hewlett  Packard's  Walthan  Division.  Projects  A  and  B  were 
125,000  and  77,000  lines  of  non-commented  source  code 
respectively.  The  two  programs  were  independent  of  each 
other,  had  short  development  times  and  had  a  low  post-release 
defect  density.  The  error  database  for  the  experiment  was 
compiled  from  prerelease  development  data  that  had  been 
maintained  on  the  programs .  Ward  concluded,  that  a  .  8 
statistical  correlation  was  found  between  complexity  and 
defect  density.  [Ref.  8] 

In  a  study  by  Butler,  four  small  programs  from 
operational  ECM  Software  were  studied.  Program  one  was  118 
lines  of  code.  Program  two  was  89  lines  of  code. 
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Program  three  was  173  lines  of  code  and  program  four  was  135 
lines  of  code.  For  each  of  the  four  programs  a  qualitative 
assessment  of  the  programs  was  made  in  terms  of 
maintainability,  readability,  comprehensibility  and  "bug 
detection".  The  data  for  this  qualitative  assessment  were 
made  over  a  period  of  two  years  on  the  basis  of  customer 
comments .  It  was  the  conclusion  of  this  study  that  the 
cyclomat ic  complexity  measure  provides  the  "best"  method  to 
analyze  and  limit  complexity  of  a  software  program.  No  direct 
correlation  was  found  between  the  incidence  of  errors  and 
cyclomatic  complexity.  A  broad  qualitative  relationship  was 
indicated.  [Ref.  9] 

In  a  study  by  Meals,  tools  to  automate  complexity 
measures  were  coded  in  COBOL  and  applied  against  three, 
separate,  large  commercial  COBOL  programs.  These  programs 
were  examined  to  determine  if  there  was  a  correlation  between 
cyclomatic  complexity  and  the  known  error  history  of  the 
software.  In  the  course  of  the  development  of  program  one,  it 
grew  from  4956  lines  to  5948.  Program  two  grew  from  4239 
lines  to  5001  lines.  Program  three  grew  from  9416  lines  to 
9425  lines.  Two  of  the  three  programs  showed  correlations 
between  the  incidence  of  errors  and  the  cyclomatic  complexity 
(however  no  coefficient  was  given) .  The  conclusion  was  that 
cyclomatic  complexity  was  useful  as  an  indicator  of  error- 
prone  sections  of  code.  [Ref.  10] 
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Henry,  Kafura  and  Harris  undertook  a  study  of  a 
single,  large  system  (the  UNIX  operating  system)  in  which 
cyclomatic  complexity  was  compared  to  the  occurance  of  errors . 
A  list  of  errors  was  obtained  from  the  UNIX  User's  Group.  A 
strong  correlation  (.95)  was  found  to  exist  between 
procedures  containing  errors  and  cyclomatic  complexity. 
[Ref.  11] 

Gollhofer,  Shimeall  and  Leveson  examined  cyclomatic 
complexity  with  respect  to  errors  in  a  group  of  27  independent 
implementations  of  a  multi-version  software  experiment.  These 
programs  ranged  in  size  from  400  to  800  LOC  and  had  an  average 
execution  time  of  three  minutes.  Faults  were  derived  through 
previous  testing  studies.  This  study  concluded  that  there  was 
a  strong  correlation  between  higher  v(G)  (v(G)  >  10)  and  the 
incidence  of  errors.  Nine  percent  of  the  modules  had  a  v(G) 
greater  than  ten;  these  same  modules  had  47%  of  the  errors. 
[Ref.  12] 

The  previous  experiments  that  attempted  to  correlate 
the  cyclomatic  complexity  measure  with  the  incidence  of  errors 
fall  into  two  groups:  empirical  studies  on  a  series  of 
relatively  small  (less  than  1000  lines)  programs  using  simple 
debugging  efforts,  and  studies  on  a  single 
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large  program,  using  problem  reports  generated  by  in-field 
use.  In  general,  the  empirical  work  done  to  date  suggests 
that  there  is  a  correlation  between  the  incidence  of  errors  in 
coding  and  the  cyclomatic  complexity  measure.  However,  there 
has  been  no  experiment  that  examined  the  correlation  between 
the  cyclomatic  measure  and  the  incidence  of  programming  errors 
in  a  series  of  independently  developed  versions  of  a 
relatively  large,  complex  program  (larger  than  1500  lines) , 
each  program  having  been  subjected  to  strenuous  fault 
detection  efforts  using  a  variety  of  detection  techniques. 

C .  PROBLEM  STATEMENT 

Larger  programs  tend  to  manifest  a  greater  incidence  of 
errors  by  virtue  of  the  increased  number  of  inter-dependencies 
that  tend  to  be  created.  [Ref.  13]  Therefore,  the 
questions  which  have  been  approached  in  this  endeavor  are: 

•  What  is  the  predictive  relationship  between  the  cyclomatic 
measure  of  program  complexity  and  the  incidence  of  errors 
in  a  series  of  larger  programs,  i.e,  does  the  relationship 
between  the  complexity  measure  and  the  incidence  of  errors 
hold  true  in  larger  programs  as  it  does  in  smaller  ones? 

This  question  addresses  how  well  previous  studies  scale 

up:  large  programs  are  more  than  a  set  of  concatenated  small 

ones.  In  this  study,  the  programs  will  be  analyzed  on  a 

version-by-version  basis,  rather  than  mixing  procedures 

together.  Also,  complex  applications  exibit  in^r  j^p-n  bir^r 

and  need  to  be  examined  as  well. 
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•  How  do  the  results  for  the  complexity  metric  compare  with 
other  similar  metrics,  notably  lines  of  code  (LOC)  ?  Eb  tie 
results  of  the  comparison  of  cyclomatic  complexity  to 
errors  and  LOC  to  errors  seem  to  parallel  each  other,  or 
does  cyclomatic  complexity  show  a  better  correlation  with 
errors  than  LOC? 

It  is  important  to  place  results  in  perspective  by  use  of 
other  measures.  The  common  complexity  measure  is  LOC.  It 
would  also  be  interesting  to  test  McCabe's  assertion  regarding 
the  utility  of  LOC  as  a  metric.  Lastly,  it  is  reasonable  to 
test  what  is  generally  known  to  be  a  high  correlation  between 
cyclomatic  complexity  and  LOC. 


D.  OVERVIEW  OF  THE  THESIS 

The  remaining  chapters  of  this  thesis  are  structured  as 
follows:  Chapter  II  provides  a  description  of  the  study 
undertaken,  presentation  of  data  and  statistical  analysis,  and 
a  discussion  of  the  data  with  an  interpretation  of  results. 
Chapter  III  provides  a  summary  of  results  and  conclusions . 
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II.  DESCRIPTION  OF  EXPERIMENT  AND  RESULTS 


A.  INTRODUCTION 

This  chapter  presents  the  methodology  used  in  this 
experiment,  and  presents  the  data  and  conclusions  generated  in 
light  of  the  questions  pcsed  in  chapter  one.  In  review  these 
questions  were: 

•  What  is  the  correlation  between  the  v(G)  of  a  procedure  or 
function  and  the  incidence  of  errors? 

•  How  does  the  cyclomatic  complexity  compare  with  Lines  of 
Code  (LOC)  as  a  predictor  of  errors? 

B .  EXPERIMENTAL  METHODOLOGY 
1.  TOOL  FOR  COMPUTING  V(G) 

To  make  the  comparison  between  the  incidence  of  found 
errors  and  v(G),  the  v(G)  had  to  be  determined  for  each 
procedure  and  function  within  each  of  the  eight  programs, 
described  in  section  C  of  this  chapter.  In  accordance  with  the 
formula : 

v(G)  =rc  +  l 

cyclomatic  complexity  was  calculated  on  the  basis  of 
predicates  (pi)  +  1.  The  following  Pascal  expressions 

constitute  the  predicates  that  were  counted: 
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if -than' a 
or' s 
while 
for 

case  (conditions) 

repeat 

Pascal  Predicates 


Because  of  the  size  of  the  programs  involved  it  was 
decided  that  the  best  approach  to  determining  the  number  of 
predicates  would  be  to  develop  a  program  to  automate  the 
process  on  a  modular  (procedure  and  function)  basis.  The 
basis  for  development  of  such  a  program  is  lexigraphical 
analysis.  The  C  programming  language  was  chosen  for  this 
application . 

A  software  tool  was  then  designed  and  coded  to  comprise 
two  related  functional  components.  The  first  functional 
component  of  the  program  is  the  lexical  analyzer.  The  lexical 
analyzer  reads  in  the  input  stream  (program)  one  character  at 
a  time.  On  the  basis  of  the  Standard  Pascal  Reserved  Word 
List,  the  lexical  analyzer  identifies  what  in  lexigraphical 
parlance  are  termed  "tokens" .  One  token  is  returned  each  time 
the  lexical  analyzer  is  called.  The  lexical  analyzer  was 
constructed  such  that  each  output  token  is  com;  se:i  f 
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the  stream  of  characters  making  up  the  token  and  the  token 
identity.  The  second  functional  component  of  the  program  is 
the  parser.  The  parser  calls  the  lexical  analyzer,  and 
through  recursive  descent  the  parser  analyses  the  tokens  in 
terms  of  all  the  legal  constructs  in  the  Standard  Pascal 
language.  Lastly  on  the  basis  of  the  recognition  of  the 
logical  constructs  the  predicates  are  counted.  The  program 
generates  each  procedure  or  function  name  and  the  associated 
v(G)  values. 

C.  EXPERIMENTAL  DATA  BASE 

This  experiment  used  eight  Pascal  programs  (refer  to 
section  B  for  a  description)  that  had  already  been  subjected 
to  rigorous  debugging  efforts.  These  programs  were  written 
for  and  were  the  subject  of  Shimeall's  [Ref.  14]  comparison  of 
fault  elimination  and  fault-tolerance  techniques  and 
comparison  of  various  testing  techniques  in  terms  of  fault 
detection.  They  were  written  from  a  single  specification  for 
a  combat  simulation  problem,  derived  from  an  industrial 
specification.  Development  followed  a  standard,  controlled, 
software  life  cycle  approach.  The  development  involved  26 
upper  division  computer  science  students  working  in  pairs . 
Eight  versions  were  eventually  produced  that  were  determined 
to  be  adequate  for  the  purposes  of  the  fault  detect  i-n,  l  y 
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successfully  executing  a  15  case  minimal  acceptance  test. 
Each  program  had  been  tested  for  errors  and  255  faults  had 
been  identified  and  corrected,  while  preserving  the 
originally  faulty  code  by  using  conditional  compilation  flags. 

Table  I  describes  the  eight  versions .  The  Modules  column 
gives  the  number  of  Pascal  functions  or  procedures  in  each 
version.  The  size  of  each  version  is  given  in  terms  of  lines 
of  source  code  and  code  lines.  Code  lines  is  the  size  of  each 
version  without  comments  and  blank  lines.  Lastly,  the  number 
of  errors  detected  in  each  version  is  given  in  the  errors 
column . 


TABLE  I  VERSION  SOURCE  PROFILE 


Version 

Modules 

Source 

Code 

Errors 

1 

72 

7503 

2414 

35 

2 

56 

3452 

1540 

11 

3 

41 

1480 

1201 

33 

4 

57 

3663 

2003 

26 

5 

28 

1634 

1544 

25 

6 

72 

3065 

2206 

24 

7 

75 

2734 

1976 

23 

8 

57 

1896 

1331 

16 

Each  version  of  the  combat  simulation  program  was  subjected  to 
five  fault  detection  techniques.  These  fault  detection 
methodologies  were:  code  reading  by  stepwise  abstraction, 
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static  data  flow  analysis,  run-time  assertions  inserted  by 
the  program  development  participants,  multi-version  voting, 
and  functional  testing  with  follow-on  structural  testing.  A 
fault  was  considered  to  exist  where  there  was  at  least  one 
associated  identifiable  misbehavior  (or  failure  to  perform  a 
required  function)  in  the  program.  A  section  of  code  was 
considered  a  single  fault  (or  bug)  where  it's  correction 
eliminated  at  least  one  failure.  Several  faults  could 
possibly  contribute  to  a  failure  for  a  particular  data  set, 
and  several  failures  could  be  caused  by  a  single  fault. 
[Ref.  14] 

In  the  course  of  counting  the  previously  identified 
faults,  if  there  was  more  than  one  correction  of  code 
associated  with  the  correction  of  a  particular  misbehavior, 
then  only  one  fault  was  counted. 

In  summary,  this  data  set  is  unique  because  it  provides 
eight  independently  developed  versions  of  a  single 
specification  for  a  program.  Because  all  programs  were 
developed  from  a  single  specification,  the  variability  of 
faults  due  to  differences  in  design  specifications  car.  be 
eliminated.  Each  version  has  been  thoroughly  debugged  using 
the  fault  detection  techniques.  The  faults  in  each  function 
and  procedure  have  been  identified.  Finally,  the  application 
is  complex,  with  many  input  variables  and  much  iter?.*  i-  r. ,  ar.  i 
each  version  is  larger  than  1200  lines  of  code  (see  Tatle  1) . 
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The  variability  of  faults  with  respect  to  cyclomatic 
complexity  has  not  been  tested  on  multiple  program  versions  of 
this  length,  and  complexity. 

D .  RESULTS 

Each  of  the  programs  were  analyzed  individually  (as 
opposed  to  grouping  all  the  procedures  and  functions  together 
from  all  eight  programs)  in  order  to  allow  individual  program 
variation  to  be  seen,  thereby  preventing  the  masking  of  one  or 
more  program  variances  by  the  entire  group.  Also,  in  order  to 
determine  if  other  metrics  (i.e.,  LOC)  predict  faults  better 
than  cyclomatic  complexity,  it  was  necessary  to  look  at 
multiple  program  cases,  to  estimate  the  population  standard 
deviation . 

The  non-parametric  tests  (ANOVA  &  Means)  were  chosen  for 
the  statistical  analysis  instead  of  a  parametric  test  because 
no  assumptions  can  be  made  about  the  population  being 
normally  distributed.  These  tests  are  relatively  insensitive 
to  the  violation  of  normality. 

1.  Relationship  of  V(G)  to  Faults 

The  first  question  under  investigation  concerns  the 
correlation  between  the  incidence  of  faults  and  the  value  of 
v  (G)  . 
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a .  Summary  of  Obamrvmd  Data 

Figures  2-9  are  scatterplots  depicting  the 
cyclomatic  complexity  {on  the  x  axis)  in  relation  to  the 
incidence  of  faults  (on  the  y  axis)  in  the  procedures  and 
functions,  for  each  of  the  eight  programs.  Each  axis  is 
labelled  with  a  number  indicating  the  maximum  value  for  that 
variable  for  each  program.  An  asterisk  represents  one 
occurrence  at  the  indicated  x,y  position.  A  number  represents 
the  number  of  functions  or  procedures  occurring  at  the 
indicated  x,y  position.  Lastly,  the  pound  sign  (#)  is 

indicative  of  ten  or  more  occurrences  of  functions  or 
procedures  at  the  indicated  position. 


20 


errors 

4 


«  « 


2  #  « 


*  *  *  * 


J  9*  H  97  *5  *  3 


cyclo  48 


Figure  2  Scatterplot  Program  1 
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Figure  3  Scatterplot  Program  2 
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Figure  4  Scatterplot  Program  3 
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Figure  5  Scatterplot  Program  4 
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Figure  6  Scatterplot  Program  5 
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Figure  8  Scatterplot  Program  7 
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Figure  9  Scatterplot  Program  8 
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There  is  probably  not  a  linear  relationship  between 
cyclomat ic  complexity  and  faults,  but  inspection  of  the  data 
indicated  that  the  majority  of  the  functions  and  procedures 
with  faults  have  a  cyclomatic  complexity  greater  than  seven. 
There  are,  however,  a  small  number  of  faulty  procedures  and 
functions  where  there  is  a  cyclomatic  complexity  less  than 
seven . 

b.  Analysis  of  Observed  Date 

To  further  explore  what  appeared,  through 
inspection  of  the  data,  to  be  a  relationship  between  higher 
values  of  v(G)  and  faults,  two  statistical  tests  were  applied 
to  the  data:  the  ANOVA  Test  and  the  Means  Test. 

(1)  ANOVA  Tests.  Functions  and  procedures  for 
each  program  were  first  ordered  according  to  v(G),  from  least 
to  greatest,  with  each  associated  fault  count,  and  then  were 
divided  into  thirds : 

•  v  (G)  fewer  than  six,  the  fault  count  mean  of  which  is 
designated  : 

•  v  (G)  from  six  to  ten,  the  fault  count  mean  of  which  is 
designated : 

•  v(G)  greater  than  ten,  the  fault  count  mean  of  which  is 
designated : 

H  < 
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The  null  and  alternative  hypotheses  in  the  one-factor  ANOVA 
model  are: 

Hoi : nl=^2=n3 


Hal :  at  laast  two  of  tha  population  naans  ara  not  aqual . 

Table  II  provides  a  summary  of  the  resultant  ANOVA  Test 
probabilities . 


TABLE  II  ANOVA  V(G)  VS.  FAULTS 


Version 

Probability 

1 

.00001308 

2 

.10820000 

3 

.00288300 

4 

.02930000 

5 

.10570000 

6 

.37630000 

7 

.00151100 

8 

.12800000 

Hoi  can  be  rejected  on  the  basis  of  four 
programs  (one,  two,  four  and  seven)  at  the  .05  alpha  level. 
Additionally,  if  the  alpha  level  of  acceptance  is  broadened  to 
.10  the  rejection  of  Hoi  is  supported  on  the  basis  of 
programs  three  and  five) .  Program  eight  is  just  beyond  the 
.10  alpha  acceptance  level  and  does  not  support  *-  h~ 
of  Hoi.  The  mean  value  for  all  eight  probabilities  is  .0940 
with  a  standard  deviation  of  .1260. 
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This  mean  value  of  .0940  does  not  support  the  rejection  of  Hoi 
at  the  .05  alpha  level  of  confidence.  If,  however  the  alpha 
level  is  broadened  to  .1  then  the  mean  for  the  ANOVA 
probabilities  does  support  the  rejection  of  Hoi.  The  ANOVA 
probability  of  program  six  is  clearly  an  outlier.  It  is  more 
than  two  standard  deviations  (standard  deviation  =  .1260)  from 
the  mean  value  (.0940)  for  all  eight  probabilities  generated 
by  the  ANOVA  tests.  The  behaviors  that  make  version  six  an 
outlier  are  interesting,  and  are  explored  at  the  end  of  this 
chapter.  If  it  is  disregarded  then  the  mean  value  probability 
for  the  remaining  programs  is  .05  with  a  standard  deviation  of 
.05.  The  mean  value  of  this  probability  supports  the 
rejection  of  Ho  at  the  .05  level.  In  summary,  (taking  into 
account  program  six)  the  ANOVA  tests  do  support  the  rejection 
of  Hoi,  and  do  support  the  acceptance  of  Hal,  the  hypothesis 
that  the  differences  between  the  numbers  of  faults  in  the  sub¬ 
groupings  of  procedures  and  functions  is  due  to  more  than 
chance.  Because  the  procedures  and  functions  for  each 
program  were  divided  into  three  groups  on  the  basis  of 
cyclomatic  complexity,  a  relationship  between  faults  and 
cyclomatic  complexity  is  indicated. 

(2)  Means  Tests.  To  further  substantiate  the 
results  obtained  with  the  ANOVA  tests  with  regard  to  the 
relationship  between  the  cyclomatic  complexity  and  the 
incidence  of  faults,  a  means  test  was  similarly  performed  on 
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each  of  the  eight  programs.  The  t  distribution  was  used  in 
this  test  because  the  fault  population  standard  deviation  is 
unknown.  The  procedures  and  functions  were  ordered  according 
to  v(G)  as  with  the  ANOVA  Tests,  and  then  were  divided  into 
two  groups,  those  with  a  v(G)  less  than  six  and  those  with  a 
v(G)  greater  than  five. 

•  v(G)  less  than  six,  the  fault  count  mean  of  which  is 
designated  : 

ol 

•  v(G)  greater  than  five,  the  fault  count  means  of  which  is 
designated : 

02 

The  null  and  alternative  hypotheses  in  the  Means  model  are: 

Ho2 : ol=o2 

Ha2  :<Jl*o2 


28 


Table  III  provides  a  summary  of  the  resultant  Means  Tests 
probabilities . 


TABLE  III  MEANS  V(G)  VS.  FAULTS 


Version 

Probability 

1 

.0029840 

2 

.0212000 

3 

.0251000 

4 

.0625000 

5 

.0291000 

6 

.0825000 

7 

.0225000 

8 

.0492000 

On  the  basis  of  the  probabilities  derived 
through  the  Means  Tests,  Ho2  can  be  rejected  on  the  basis  of 
six  programs  (one,  two,  three,  five,  seven  and  eight) ,  at  the 
.05  alpha  level.  If  the  alpha  acceptance  level  is  extended 
to  .1,  programs  four  and  six  support  the  rejection  of  Ho2 . 
The  mean  value  for  all  eight  probabilities  is  .0365  with  a 
standard  deviation  of  .0264,  and  it  supports  the  rejection  of 
Ho2  at  the  .05  level. 

In  summary,  the  Means  Tests  do  reject  Ho2 
and  support  the  acceptance  of  Ha2 ,  the  hypothesis  that  the 
differences  between  the  mean  values  of  the  subgroupinas  of 
procedures  and  functions  is  due  to  more  than  chance.  In 
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other  words,  a  relationship  between  higher  values  of 
cyclomatic  complexity  and  the  increased  incidence  of  faults  is 
suggested  by  both  the  ANOVA  Tests  and  the  Means  Tests. 

It  is  important,  however,  to  place  this 
support  in  context.  In  real  testing  efforts,  tests  are 
carefully  planned,  and  part  of  this  planning  involves  the  use 
of  metrics.  Thus,  whether  cyclomatic  complexity  predicts 
faults  in  isolation  is  less  useful  than  whether  it  predicts 
better  than  other  commonly  used  metrics.  This  issue  is 
explored  in  the  next  section. 

2 .  Relationship  of  Cyclomatic  Complexity  to  Lines  of  Code 
The  second  question  under  investigation  concerns  the 
comparison  of  cyclomatic  complexity  with  Lines  of  Code  (LOC) 
as  a  predictor  of  faults.  In  order  to  make  this  comparison, 
the  incidence  of  faults  in  the  procedures  and  functions  for 
each  program  was  examined  in  comparison  to  the  respective  LOC 
for  each  procedure  and  function. 
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a.  Summary  of  Obamrvad  Data 

Figures  9-16  are  scatterplots  depicting  the  Lines 
of  Code,  on  the  x  axis,  as  a  function  of  the  incidence  of 
faults,  on  the  y  axis,  of  the  procedures  and  functions,  for 
each  of  the  eight  programs .  Each  axis  is  labelled  with  a 
number  indicating  the  maximum  value  for  that  variable  for  each 
program.  An  asterisk  represents  one  occurrence  at  the 
indicated  x,y  position.  A  number  represents  the  number  of 
functions  of  procedures  occurring  at  the  indicated  x,y 
position.  Lastly,  the  pound  sign  (#)  is  indicative  of  ten  or 
more  occurrences  of  functions  and  procedures  at  the  indicated 
x,y  position. 
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Figure  10  Scatterplot  Program  1 
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Figure  11  Scatterplot  Program  1 
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Figure  12  Scatterplot  Program  1 
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Figure  17  Scatterplot  Program  1 
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There  is  probably  not  a  linear  relationship  between 
LOC  and  faults,  but  inspection  of  the  data  indicated  that  the 
majority  of  the  functions  and  procedures  with  faults  have  a 
size,  in  terms  of  LOC,  greater  than  30.  There  are  however,  a 
small  number  of  procedures  and  functions  where  the  Lines  of 
Code  measurement  is  less  than  30  with  the  presence  of  faults. 
b.  Analysis  of  Observed  Data 

As  with  the  statistical  tests  to  determine  if  there 
was  any  relationship  between  the  incidence  of  faults  and  the 
cyclomatic  measure,  the  relationship  between  the  measure  of 
Lines  of  Code  (LOC)  and  the  incidence  of  faults  was  examined 
using  the  ANOVA  and  the  Means  Tests. 

(1)  ANOVA  Tests.  Functions  and  procedures  for 
each  program  were  first  ordered  according  to  LOC,  from  least 
to  greatest,  with  each  associated  fault  count.  The  data  were 
divided  approximately  into  thirds  with  respect  to  the  total 
number  of  procedures  because  there  were  large  numbers  of 
procedures  with  the  same  cyclomatic  complexity  value, 
especially  in  the  lower  cyclomatic  complexity  value  range. 


36 


The  procedures  and  functions  were  divided  into  the  following 
groups : 

•  bottom  third  with  respect  to  the  total  of  procedures,  the 
mean  fault  count  of  which  is  designated: 

Hi 

•  middle  third  with  respect  to  the  total  of  procedures,  the 
mean  fault  count  of  which  is  designated: 

\i2 

•  upper  third  with  respect  to  the  total  of  procedures,  the 
mean  fault  count  of  which  is  designated: 

The  null  and  alternative  hypotheses  in  the  one-factor  ANOVA 
model  are: 

Ho3  :  |il  =|i,2~n3 

Ha3 :  at  least  two  of  the  population  means  are  not  equal. 
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Table  IV  provides  a  summary  of  the  resultant  ANOVA  Test 
probabilities . 


TABLE  IV  ANOVA  LOC  VS.  FAULTS 


Version 

Probability 

1 

.00019740 

2 

.04930000 

3 

.00617900 

4 

.01770000 

5 

.08410000 

6 

.19170000 

7 

.04070000 

8 

.10230000 

3n  the  basis  of  the  probabilities  derived  through  the  ANOVA 
tests,  Ho3  can  be  rejected  on  the  basis  of  five  programs  (one, 
two,  three,  four  and  seven),  at  the  .05  alpha  level.  If  the 
alpha  tolerance  level  is  broadened  to  .10,  programs  five  and 
eight  further  support  the  rejection  of  Ho3 .  The  ANOVA 
probability  of  program  six  is  rather  high  and  tends  not  to 
support  the  rejection  of  Ho3.  The  mean  probability  for  all 
eight  programs  for  the  ANOVA  tests  is  .0615  which  does  not 
support  the  rejection  of  Ho3  at  the  .05  alpha  level.  It 
should  be  noted,  however,  that  the  probability  derived  through 
the  ANOVA  test  of  program  six  is  nearly  two  standard 
deviations  (standard  deviation  =  .0637)  from  the  mean  value 
(.0615)  for  all  probabilities  generated  by  the 
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ANOVA  tests;  as  such  it  is  an  outlier.  If  the  ANOVA 
probability  of  program  six  is  discarded  then  the  mean  value 
probability  for  programs  one,  two,  three,  four,  five,  seven 
and  eight  is  .0429  with  a  standard  deviation  of  .0389.  Tne 
mean  value  of  this  probability  supports  the  rejection  of  Ho3 
at  the  .05  level.  In  summary,  if  program  six  is  considered  an 
outlier,  the  ANOVA  Tests  do  reject  Ho3  and  support  Ha3,  the 
hypothesis  that  the  differences  between  the  numbers  of  faults 
in  the  sub-groupings  of  procedures  and  functions  are  due  to 
more  than  chance.  Because,  as  was  previously  stated,  the 
procedures  and  functions  for  each  program  were  divided  into  3 
groups  on  the  basis  of  LOC,  it  is  suggested  that  there  is  a 
relationship  between  LOC  and  the  incidence  of  faults,  as  was 
the  case  with  cyclomatic  complexity. 

(2)  Means  Tests.  A  Means  test  was  also 
performed  on  each  of  the  eight  programs  in  order  to  further 
substantiate  the  results  of  the  ANOVA  Tests .  The  t 
distribution  was  used  in  this  test  because  the  fault 
population  standard  deviation  is  unknown.  The  procedures  and 
functions  were  ordered  according  to  LOC,  and  were  roughly 
divided  in  half  according  to  the  number  of  procedures  and 
functions.  This  is  entirely  analogous  to  the  division  into 
thirds  of  the  previous  section. 


39 


The  data  are  represented  as  follows: 


•  bottom  half  with  respect  to  the  total  of  procedures,  the 
mean  fault  value  of  which  is  designated: 

ol 

•  upper  half  with  respect  to  the  total  of  procedures,  the 
mean  fault  value  of  which  is  designated 


o2 

The  null  and  alternative  hypotheses  in  the  Means  model  are: 


Ho4 : o 1  =  O  2 


Ha4 : ol *02 


Table  V  provides  a  summary  of  the  resultant  Means  Tests 
probabilities . 


TABLE  V  MEANS  LOC  VS.  FAULTS 


Version 

Probability 

1 

.00173800 

2 

.00932300 

3 

.00768200 

4 

.01250000 

5 

.01720000 

6 

.02750000 

7 

.06980000 

R 

.01160000 

40 


Ho4  can  be  rejected  on  the  basis  of  seven  programs  (one,  two, 
three,  four,  five,  six,  and  eight),  at  the  .05  alpha  level. 
Additionally  at  the  .07  alpha  level  program  seven  supports  the 
rejection  of  Ho4 .  The  mean  value  of  all  eight  probability 
results  from  the  Means  Tests  is  .0197  with  a  standard 
deviation  of  .0216.  In  summary  the  Means  Tests  support  the 
rejection  of  Ho4  and  the  acceptance  of  Ha4,  the  hypothesis 
that  the  differences  between  the  numbers  of  faults  in  the  sub¬ 
groupings  of  procedures  and  functions  is  due  to  more  than 
chance.  In  other  words,  a  relationship  between  higher 
measures  of  Lines  of  Code  (LOC)  and  the  increased  incidence  of 
faults  is  suggested  by  both  the  ANOVA  Tests  and  the  Means 
Tests . 

c.  Comparison  of  Cyclomatic  Complexity  and  LOC  as 
Predictors  of  The  Incidence  of  Faults 

Tables  VI  and  VII  provide  a  summary  comparison  of 
the  results  obtained  from  sections  a  and  b.  The  column 
entitled  ANOVA  Cyclo  (Table  VI)  lists  the  probabilities 
derived  from  application  of  the  ANOVA  Test  to  the  programs  in 
order  to  look  at  the  faults  as  a  function  of  the  cyclomatic 
complexity.  The  column  entitled  ANOVA  LOC  (Table  VI)  lists 
the  probabilities  derived  from  application  of  the  ANOVA  Test 
to  the  programs  in  order  to  look  at  the  faults  as  a 
function  of  the  Lines  of  Code.  The  column  entitled  Means 
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Cyclo  (Table  VII)  lists  the  probabilities  derived  from 
application  of  the  Means  Test  to  the  programs  in  order  to  look 
at  the  faults  as  a  function  of  the  cyclomatic  complexity.  And 
lastly,  the  column  entitled  Means  LOC  (Table  VII)  lists  the 
probabilities  derived  from  application  of  the  Means  Test  to 
the  programs  in  order  to  look  at  the  faults  as  a  function  of 
the  Lines  of  Code. 


TABLE  VI  SUMMARY  OF  ANOVA 
TESTS  V (G)  VS. 
FAULTS  &  LOC  VS. 
FAULTS 


Version 

ANOVA  Cycle 

ANOVA  LOC 

1 

.00001308 

.00019740 

2 

.10820000 

.04930000 

3 

.00288300 

.00617900 

4 

.02030000 

.01770000 

5 

.10570000 

.08410000 

6 

.37630000 

.19170000 

7 

.00151100 

.04070000 

6 

.12800000 

.10230000 

TABLE  VII  SUMMARY  OF 
MEANS  TESTS 
V (G)  VS.  FAULTS 
&  LOC  VS.  FAULTS 


Version 

Heins  Crclo 

Means  LOC 

1 

.00029840 

.00173800 

2 

.02100000 

.00932300 

3 

.02510000 

.00768200 

4 

.06250000 

.01250000 

5 

.02910000 

.01720000 

6 

.08250000 

.02750000 

7 

.02250000 

.06980000 

6 

.04920000 

.01160000 

Inspection  of  the  ANOVA  tests  (see  table  VI)  shows 
that  the  probabilities  were  lower  for  the  cyclomatic 
complexity  than  the  LOC  in  three  of  eight  the  programs  (one, 
three  and  seven)  .  In  the  other  five  programs  (two,  four, 
five,  six  and  eight)  the  probabilities  were  1  w-_-.  : 

than  the  cyclomatic  complexity,  possibly  .ndrra*.  in 


+  *  .. 
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a  stronger  relationship  between  size  (LOC)  and  the  incidence 
of  faults.  Inspection  of  the  Means  tests  (see  Table  VII) 
shows  that  the  ANOVA  probabilities  were  lower  for  cyclomatic 
complexity  as  apposed  to  LOC  in  only  two  cases  (one  and 
seven)  . 

The  two  columns  of  data  shown  in  Table  VI  were  compared 
using  a  Means  test  to  determine  if  there  existed  a 
statistically  significant  difference  between  them.  Similarly, 
a  Means  test  was  applied  to  the  two  columns  of  data  in  Table 
VII . 

(1)  Means  test  V (G)  verses  LOC  for  ANOVA  Tests, 
A  Means  Test  was  performed  on  the 

probability  results  derived  from  the  ANOVA  Tests  on  the  two 
groups:  faults  as  a  function  of  cyclomatic  complexity  and 
faults  as  a  function  of  LOC  (see  table  VI) .  The  means  test 
produced  a  probability  of  .1921,  which  does  not  support  the 
rejection  of  the  equivalency  of  the  two  groups  at  any 
reasonable  alpha  level.  In  other  words,  it  cannot  be  stated 
that  there  is  a  stronger  relationship  between  cyclomatic 
complexity  and  faults  than  LOC  and  faults  or  vice  versa  with 
regard  to  the  ANOVA  Test  probabilities. 

(2)  Means  test  V  (G)  verses  LOC  for  Means  Tests. 
Additionally,  a  Means  Test  was  conducted  on 

the  probability  results  derived  from  the  Means  Tests  on 
the  two  groups:  faults  as  a  function  of  cyclomatic 
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complexity,  and  faults  as  a  function  of  LOC  (see  table  VI 
columns  four  and  five)  .  The  means  test  produced  a  probability 
of  .0854,  which  does  not  support  the  rejection  of  the 
equivalency  of  the  two  groups  at  the  .05  alpha  level. 

d.  Low  LOC  with  Faults  and  Low  V(G)  with  Faults 

In  order  to  further  examine  the  relationship 
between  cyclomatic  complexity  and  LOC,  the  procedures  and 
functions  containing  faults  with  low  cyclomatic  complexity  or 
low  LOC  were  examined  to  determine  if  there  was  a  higher 
incidence  of  the  other  factor  (higher  LOC  in  the  case  of 
procedures  or  functions  with  low  cyclomatic  complexity  and 
faults  for  example)  to  explain  the  faults. 

(1)  Statistical  Tests  to  Examine  Procedures  with 
Low  V (G)  and  faults.  As  previously  stated,  it  was  observed  by 
inspection  of  the  data  that  there  appeared  to  be  a  higher 
incidence  of  faults  in  procedures  and  functions  with  a  v(G)  of 
roughly  greater  than  7.  The  Means  tests  and  ANOVA  tests 
support  the  contention  that  the  differences  in  mean  faults 
between  low  cyclomatic  complexity  procedures /functions  and 
higher  ones  is  due  to  more  than  chance.  In  this  section, 
faulty  procedures  and  functions  with  cyclomatic  complexity 
values  less  than  seven  are  examined  to  determine  if  there  is 
any  support  for  correlation  between  LOC  and  faults  in  these 
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procedures.  The  procedures  and  functions  with  a  v(G)  of  less 
than  seven  were  divided  into  two  groups  to  compare  LOC: 

•  faults  present,  the  mean  of  LOC  which  is  designated: 


•  no  faults  present,  the  mean  of  LOC  which  is  designated: 


The  null  and  alternative  hypotheses  in  the  Means  model  are: 

Ho  5 : ol =02 

Ha5 :ol*a2 


Table  VIII  provides  a  summary  of  the  resultant  Means  Tests 
probabilities . 


TABLE  VIII  MEANS  LOW  V(G)  LOC 
VS.  FAULTS 


Version 

Probability 

1 

.064000 

2 

.259900 

n 

.017200 

4 

.025300 

5 

.028900 

6 

.050800 

7 

.010400 

8 

.006272 
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Programs  three,  four,  five,  six,  seven  and  eight  support  the 
rejection  of  Ho5  and  the  acceptance  of  Ha5  at  the  .05  alpha 
confidence  level.  Additionally  if  the  alpha  level  is 
broadened  to  .10,  program  one  supports  the  rejection  of  Ho5 . 
The  probability  for  program  two  is  more  than  two  standard 
deviations  (.08)  beyond  the  mean  for  all  eight  programs  (.05) 
and  as  such  is  an  outlier.  The  mean  probability  for  all  eight 
programs  (.05)  supports  the  rejection  of  Ho5  and  the 
acceptance  of  Ha5  at  the  . 05  alpha  level .  Inspection  of  the 
data  revealed  that  in  all  programs  the  mean  values  of  the  LOC 
were  higher  in  the  groups  with  the  presence  of  faults  than  in 
the  group  without  them.  In  summary,  there  does  appear  to  be 
support  for  a  relationship  between  the  faulty  procedures  and 
functions  with  a  v (G)  of  less  than  seven  and  higier  values 
for  LOC. 

(2)  Statistical  Tests  to  Examine  Procedures  with 
Low  LOC  and  Faults.  It  was  also  determined  upon  inspection  of 
the  data  and  because  of  supporting  results  from  Means  tests 
and  ANOVA  tests  that  there  was  a  higher  incidence  of  faults  in 
procedures  and  functions  larger  than  30  LOC.  Similarly,  as  in 
the  previous  section,  the  Means  Test  was  used  to  examine 
any  correlation  between  v(G)  and  the  incidence  of 
faults  in  procedures  with  a  size  of  less  than  31 
LOC.  The  procedures  and  functions  with  a  Lines  of 
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Code  count  (LOC)  of  less  than  31  were  divided  into  two  groups 
to  look  at  v (G)  : 

•  faults  present,  the  mean  value  of  v(G)  which  is 
designated: 


ol 

•  no  faults  present,  the  mean  value  of  v(G)  which  is 
designated : 


02 

The  null  and  alternative  hypotheses  in  the  Means  model  are: 


Ho6 : a  1=02 


Ha6 : Ol *02 


Table  IX  provides  a  summary  of  the  resultant  Means  Tests 
probabilities . 


TABLE  IX  MEANS  LOW  LOC  V(G)  VS. 
FAULTS 


Version 

Probability 

1 

.085100 

2 

.364200 

3 

.308700 

4 

.477000 

5 

.111000 

6 

.181500 

7 

.160400 

8 

.218500 
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On  the  whole  the  resultant  probabilities  for  all  eight 
programs  do  not  support  the  rejection  of  Ho6,  with  the 
possible  exception  of  program  1 .  In  other  words  there  is 
virtually  no  support,  in  those  procedures  and  functions  with 
a  LOC  size  of  less  than  31,  for  a  relationship  between  faults 
and  increased  cyclomatic  complexity. 

(3)  Means  test  comparison  of  probabilities  from 
tables  VIII  and  IX.  In  order  to  verify  that  there  was  no 
equivalency  between  the  resultant  probabilities  obtained  in 
subsections  (1)  and  (2)  ,  a  Means  test  was  conducted  on  the 
probabilities  from  tables  VIII  and  IX.  The  resultant 
probability  was  .003121,  thus  the  equivalency  of  the  two 
groups  is  rejected. 

3.  Summary  of  Results 

In  summary  it  was  found,  that  with  respect  to  the 
programs  analyzed  in  this  study,  there  does  appear  to  be  a 
correlation  between  the  cyclomatic  complexity  and  the 
incidence  of  faults.  A  correlation  between  the  simple  measure 
LOC  and  the  incidence  of  faults  was  also  found,  but  not 
significantly  different  from  the  correlation  with  cyclomatic 
complexity.  Procedures  and  functions  with  a  low  cyclomatic 
complexity  (v(G)  <  7)  that  contained  faults  did  exhibit  a 
significant  correlation  between  faults  and  LOC,  but  the 
converse  was  not  true,  in  that  small  procedures  and  functions 
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(LOC  <  31)  with  faults  did  not  tend  to  have  higherv (G) 
values  than  procedures  without  faults. 

Lastly  there  is  the  question  of  the  results  of  program 
six:  in  the  case  of  the  ANOVA  test  of  faults  with  respect  to 
cyclomat ic  complexity,  six  was  the  exception  and  as  such  did 
not  exhibit  a  correlation  between  cyclomatic  complexity  and 
faults.  Examination  of  this  program  in  closer  detail  with 
respect  to  the  location  of  faults  has  shown  that,  of  those  low 
v(G)  procedures  and  functions  with  faults,  5  of  the  11  total 
low  v(G)  faults  were  related  to  variable  initialization  or 
assignment,  2  were  parameter  passing  faults  and  one  was  a 
calculation  fault.  In  other  words,  just  over  70%  of  the  low 
end  errors  appeared  to  be  related  to  complexity  factors  that 
are  not  directly  linked  to  the  incidence  of  decision  nodes  or 
logical  branching  (cyclomatic  complexity) . 
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III.  SUMMARY  AMD  CONCLUSIONS 


A.  SUMMARY 

This  study  must  be  viewed  in  the  context  of  the  programs 
that  were  examined.  The  programs  were  developed  by  student 
coders  in  a  university  course  environment,  and  are  not 
necessarily  representative  of  a  commercial  programming 
environment.  Further,  although  the  programs  that  were 
examined  were  thoroughly  and  rigorously  debugged  using  various 
testing  techniques  (as  indicated  in  chapter  2) ,  and  all  eight 
versions  essentially  were  subjected  to  the  equivalent  of  unit 
testing,  the  programs  were  not  further  tested  at  a  systems 
integration  level  of  testing.  These  programs,  being  multiple 
versions  of  one  design  specification,  simply  are  not  suited  to 
such  an  endeavor. 

The  results  of  this  study  have  added  to  the  general  body 
of  knowledge  concerning  the  relationship  of  the  cyclomatic 
complexity  to  the  occurrence  of  faults.  Firstly,  this 
analysis  of  multiple  versions  of  complex,  relatively  large 
programs  (ranging  in  size  from  1200  to  2400  LOC)  indicates 
that  there  is  a  correlation  between  faults  and  the  cyclomatic 
complexity  measure.  Further,  it  was  found  in  this  study  that 
the  results  of  the  analysis  of  LOC  to  faults  roughly 
parallels  the  relationship  of  cyclomatic  complexity 
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to  faults .  It  could  not  be  determined  in  this  context  whether 
cyclomatic  complexity  or  LOC  had  a  stronger  relationship  to 
faults.  Additionally,  it  would  be  interesting  to  undertake  a 
similar  study,  based  on  the  more  thorough  testing  and 
accumulation  of  fault  data,  at  a  systems  integration  level  of 
testing,  and  in  a  commercial  environment. 

B .  RECOMMENDATIONS 

With  regard  to  the  process  of  software  testing,  the 
cyclomatic  complexity  measure  is  supported  as  a  predictor  of 
potentially  faulty  regions  of  code,  and  as  such  it  is  a  tool 
that  the  software  manager  could  use  to  facilitate  a  more 
successful  testing  strategy  and  subsequent  testing.  On  the 
other  hand,  because  these  results  indicate  that  there  is  a 
correlation  between  larger  modules  and  faults,  and  given  that 
LOC  is  a  very  easily  obtained  metric,  the  software  manager  may 
be  well  advised  to  utilize  LOC  as  opposed  to  cyclomatic 
complexity.  A  more  cautious  approach  would,  however,  be  the 
employment  of  both  cyclomatic  complexity  and  LOC  as  mutually 
supportive  predictors  of  faulty  code  regions,  as  these  two 
metrics  both  seem  to  parallel  each  other  with  regard  to  fault 
prediction . 
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C.  OPEN  RESEARCH  QUESTIONS 

There  are  several  interesting  research  questions  that  have 
arisen  in  the  course  of  this  study  which  remain  to  be  studied 
with  regard  to  complexity  issues.  There  is,  for  example,  the 
issue  of  how  to  deal  with  cases  where  the  cyclomatic 
complexity  is  not  a  predictor.  Version  six  seems  to  fall  into 
this  category,  considering  the  results  of  the  ANOVA  test. 
Another  issue  is  how  metrics  can  be  developed  to  predict 
faults  in  relatively  short  (less  than  30  LOC)  and  simple  (in 
terms  of  cyclomatic  complexity)  modules.  Intra-modular 
complexity  is  addressed  by  cyclomatic  complexity.  However, 
inter-modular  complexity,  which  is  potentially  a  rich  source 
of  faulty  code,  is  not  addressed  by  the  cyclomatic  complexity 
measure.  These  questions  remain  open  to  further  research 
work . 


52 


LIST  OF  REFERENCES 


1.  Brooks,  F.P.,  The  Mythical  Man-month,  p.  8,  Addison-Wesley 
Publishing  Company,  1975. 

2.  McCabe,  T.J.,  "A  Complexity  Measure, "  IEEE  Transactions  on 
Software  Engineering,  v.  SE-2,  no.  4,  December  1976. 

3.  Boehm,  B.W.,  TRW,  "Improving  Software  Productivity,"  IEEE 
Computer  Magazine,  v.  20,  no.  9,  pp.  43-57, 

September  1987. 

4.  Beizer,  B.,  "Software  System  Testing  and  quality 

Assurance,"  p.  14,  Van  Nostrand  Reinholder,  1984. 

5.  Walsh,  T.J.,  "A  software  reliability  study  using  a 
complexity  measure,"  AFIPS  Conference  Proceedings,  v.  48, 
pp.  761-768,  1979. 

6.  Schneidewind,  N.F.,  "An  Experiment  in  Software  Error  Data 
Collection  and  Analysis,"  IEEE  Transactions  on  Software 
Enginerring,  v.  SE-5,  no.  3,  May  1979. 

7.  Basili,  V.R.,  Selby,  R.W.,  Phillips,  T.,  "Metric  Analysis 
and  Data  Validation  Across  Fortran  Projects,"  IEEE 
Transactions  on  Software  Engineering,  v.  SE-9,  no.  6,  pp . 
652-663,  November  1983. 

8.  Ward,  W.T.,  "Software  Defect  Prevention  Using  McCabe's 
Complexity  Metric,  "  Hewlett-Packard  Journal,  v.  40,  p.  64, 
April  1989. 

9.  Butler,  L.P.,  "Software  Quality  Assurance  Cyclomatic 
Complexity  of  a  Computer  Program, "  Proceedings  of  the  IEEE 
1983  National  Aerospace  and  Electronics  Conference ,  v.  2, 
pp.  867-73,  1983. 

10.  Meals,  R.R.,  "An  Experiment  in  the  Implementation  and 
Application  of  Halstead' s  and  McCabe' s  Measures  of 
Complexity, "  Software  Engineering  Standards  Application 
Workshop,  pp .  45-50,  1981. 

11.  Henry,  S.,  Kafura,  D.,  Harris  K.,  "ON  THE  RELATIONSHIP 
AMONG  THREE  SOFTWARE  METRICS,"  Performance  Evaluation 
Review,  v.  10,  no.  1,  pp .  3-10,  1981 

12.  Gollhofer,  M.,  Predicting  Errors  using  McCabe's  Metric, 
Master's  Thesis,  University  of  California,  Davis,  1983. 


53 


13.  Myers,  J.G.,  The  Art  of  Software  Testing,  John  Wyley  & 
Sons,  1979. 

14.  Shimeall,  T.J.,  An  Empirical  Comparison  of  Software  Fault 
Tolerance  and  Fault  Elimination,  Ph.D.  Disertation,  U.C. 
Irvine,  1989. 


54 


APPENDIX 


Program  (parsel.c)  is  included  as  an  appendix  because  it 
was  developed  specifically  for  this  study  as  a  tool  designed 
to  automatically  calculate  the  cyclomatic  complexity  of 
procedures  and  functions  in  eight  relatively  large  (1200-2400 
LOC)  Pascal  programs .  The  use  of  such  a  tool  greatly 
facilitates  the  determination  of  cyclomatic  complexity  (which 
is  often  referred  to  as  a  lexical  metric)  ,  what  would 
otherwise  be  a  tedious  and  error  prone  activity.  This  is 
particularly  true  of  this  study,  in  which  the  programs 
analyzed  were  very  complex  in  terms  of  the  level  of  nesting, 
the  depth  of  procedural  scoping,  and  the  length  and  complexity 
of  constructs.  The  programming  methodology  employed  in  this 
lexical  analyzer  and  parser  is  recursive  descent. 
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/ *  program 
^programmer 
*datc  last  revision 

*  computet 
*compilei 

* 


parse  1  c 

Edwin  A  Shuman 
8  March  1990 
Va.x  11/785 

Berkeley  C  Compiler  (4.3BSD) 


*program  description:  This  program  is  a  pascal  lexical  analyzer  and  parser. 

*  It  is  designed  to  read  a  pascal  source  file  and 

*  output  a  stream  of  tokens.  Each  token  is  comprised  of 

*  a  "C"  Structure  containing  the  token  itself 

*  and  its  identification  The  tokens  are  interpreted 

*  according  to  the  Backus-Naur  Form  for  the  Pascal 

*  Language.  On  the  basis  of  the  gramnuttica I  contstructs 

*  (specifically  the  Pascal  Predicates:  if— then,  while, 

*  repeat,  case  ( conditions ),  for.  or)  the  Cyclomatic 

*  Complexity  is  derived 

*  / 


#lnclude  <cfype.h> 

#include  <ctype.li> 

•Include  <stdio.h> 

•define 

TRUE  1 

•define 

FALSE  (i 

#define 

and  257 

#define 

array  255 

#deflne 

begin  256 

•define 

ease  200 

•define 

const  2b  1 

#deflne 

div  2b2 

•define 

do  265 

•define 

downlo  26*1 

•define 

else  2o5 

•define 

end  266 

#deflne 

_ file  267 

•define 

for  2oS 

#d.  *lne 

function  26l) 

'  lei.  ie 

goto  270 

_if  271 

#deilne 

in  272 

•define 

label  27  7 

•define 

mod  274 

•define 

_ nil  275 

•define 

not  27 6 

•define 

“of  277 

•define 

_or  27X 

•define 

_packcd  276 

#deflnc 

ptoicduic  2X( 

•define 

program  2X1 

•define 

tecord  2H2 

•define 

repent  2X  ' 

•define 

set  2X1 

•define 

then  2X5 

•define 

to  2X0 

•define 

type  2X7 

•define 

until  2Xo 

•define 

vai  260 

•define 

while  26 1 

•define 

with  2°2 

•define 

•  dent  26  < 

•define 

mt  264 
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#deflne  real  295 
#deflne  string  2% 

#de(lne  assign  297 
#deflne  jlus  298 
#deflne  addingop  299 
#deflne  divide  3(H) 

#define  leftpar  301 
#define  _rightpar  302 
#deflne  colon  303 
_seniicol  304 
#deflne  comma  305 
#define  leftbrace  306 
#deflne  _nghtbrace  307 
#define  _period  308 
#define  _pointer  309 
#deflne  mult  310 
#deflne  exp  3 1 1 
#define  dotdot  312 
#deflne  relational_op  313 
#deflne  minus  314 

■'*  reserve  is  an  arras'  of  structures  which  holds  oil  of  the  pascal 
*  reserved  words 


struct  { 

char  word[12]; 
Int  value. 

}  teserve[  |  = 


"am!". 

and. 

"array". 

a  tray. 

"begin”. 

b"gin. 

"case". 

case. 

"const". 

const. 

d  i  v . 

"do". 

_ do. 

’‘downto". 

downto. 

"else". 

else. 

end". 

end. 

"file". 

file. 

"for". 

_fot. 

"function". 

function. 

"goto'  , 

goto. 

"if  . 

jf. 

"in". 

in. 

"label". 

label. 

"nM»d". 

mod. 

"nil". 

"nil. 

"not". 

not. 

"of". 

'of. 

"or”. 

or. 

"packed". 

packed. 

"pUKedme". 

priK  cdiitc 

'  program'  . 

program. 

"record  . 

tccotd. 

}: 


"with". 

"anyother", 


with. 

ident, 


/*  lexrec  is  a  type  definition  of  a  structure  which  is  the  format  of  the  token 
*  returned  by  text) 

*/ 


typedef  struct  { 
char  tok(15]; 
lot  toktype. 

}  lexrec; 

lexrec  token,  temptok,  temptok2,  temptok?.  temptok4,  templok5; 

lexrec  stack[30]: 

int  use  temp  tok,  count: 

static  int  line: 

static  int  tokflag.tokflagl,tokflag2,tokflag:',tokflag4  -  0; 

static  int  sc  =  -1: 

/*  Icxdigiltcl  is  called  if  the  standard  input  character  is  a  digit  */ 
/*  returns  token  *  i 


lexdigit(c) 

char  c: 

{ 

Int  d. 

i: 

float  valucdec: 

i  =  0; 

while  (isdigitlcil  { 

token. tokiypc  =  _ int : 

token,  tok  [i  |  =•  c; 
c  =  getcistdin): 
i  +  +  : 

) 

If  (c  ==  'E'i  { 

token,  tok  [i  ]  =  c: 
token  toktype  =  _real: 
c  =  getcistdin  i: 
i  +  4 : 

if  tic  ==  '+')  M  (c  =-  II  (isdigit(c)))  { 

token. tok|i|  =  c; 
i++: 

c  -  getcistdin): 
while  (isdigtlic)l  ( 

!oken.tok|i|  =  c; 
c  =  getcistdin  i, 

i+-*c 

) 

) 

ungetc(c.stdin). 

return. 

> 

ir  ic  ^  i  { 

tl  -  pelt. <  siilin > 

If  (i*ahgil(tlp  { 

token  toktype  =  real: 
token  tok|i|  - 
i  *•  *  . 

token  tok|i|  -  il 


lexdigit 
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c  =  getc(stdin); 
i++; 

while  (isdigittc))  { 
token. lok[i]  =  c; 
c  =  getc(stdin); 
i++: 

} 

If  (c  ==  E  )  { 
i++: 

token  tok[i]  =  c: 
c  =  getc(stdin): 
i++; 

If  (.( c  =  '+')  II  (c  =  II  (isdigit(cl)t  { 
token. tok[i)  =  c: 
i++: 

c  =  getc(stdin): 
while*  isdigittc))  { 
token. tok[i]  =  c: 

<-■  =  getc(stdin): 
i +  + : 

) 

) 

) 

ungctctc.stdin): 

return. 

) 

else  if  <d  =-  .  i  { 
si  re  p  vt  temptok.lok. 

temptok.toktypc  =  dot  dot: 
usc_tcmp_tok  =  TRUE. 

> 

> 

*  if  the  next  t  hat  aftet  the  ini  «ut  net  7T  <>>  * 

else  { 

ungetete.  stdini: 

return; 

) 

) 


*  lesalpha  if  allied  if  the  standard  inpn’  is  a  letter  ot  numbet  *  ' 

lexalphak)  /  C.XCl  l /ill 

char  c: 

( 

int  i. 

1  =  0; 

while  ((isalnumu ))  II  U  =-  D  { 
token,  tok  |  i  |  =  c: 
c  =  getet  stdtn  >: 

i  +  +  : 

) 

token  toktvpc  =  idcnt_tokcn(tnkcn.tnk). 
ungckti.  stdm). 


4  <  ailed  if  fldi* !  if  net  a  letlv  ft  a  dicit  4 
*  return ,.  If  ten  4 


?<) 


lexxu  it-,  hu  ) 

char 

( 


U  .\wm  h 


Int  i. 

d. 


switch! c)  { 

case  V:  /*  a  string  *1 

token  tok|0|  =  A": 
c  =  getctstdin ); 
i  -  I: 

while  <c  !=  V  )  { 

token. tok[i)  =  c; 
c  =  getctstdin); 

i++; 

} 

c  =  getctstdin  >; 

if  (c  ==  A")  { 

c  =  getctstdin); 

while  (c  !=  A")  { 

token  tok|i)  =  c: 
c  =  getctstdin ); 
i+  +  , 

> 

token,  tokfi  |  =  A": 
tokcn.toktype  =  _stnng; 

break: 

) 

else  { 

ungcti  (c.stdin ); 

token  tok|i  |  =  A  ; 
token  'oktvpe  =  slnnc; 

break. 

} 

break 

case  - 

token  |r»k |< >|  = 

token  toklype  -  relational  op: 

break: 
case  V: 

token. tok|OJ  =  '  >  ; 
token. tnktv|>c  —  tightpat: 

break 

case 

c  =  getctstdin) 

If  tc  ^  =  '  { 

slicpyi token. tok.  ":  =  " i: 
loken.tokf.pe  •-  assign. 

> 

else  { 


token  lok|o|  e 
token  toklvpe  colon: 
nnc’Ctcic .stdin  i: 

} 

break 

case 

token  tok|l'|  -- 
token  tokt v|h*  -  setnn  ol 

break 

case 

Ink  -n  |ok|tl| 

token  tnktvjv-  i  mum:) 

break 

case  | 

tok  r  n  tok  |fl|  [  . 
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token. teletype  =  leftbrace; 

break: 

case 

token. tok|0]  = 
token. toktype  =  _rightbrace; 

break; 

case 

token. tok[0]  = 
token. toktype  =  _period; 

break: 

case 

token. tok|0]  =  "  : 
token. toktype  =  _pointei; 

break. 


case  / 

token. tok|<M  =  ' 

token,  toktype  =  _di\ide; 
break: 
case  +  : 

token.  tok|0|  =  +  : 
token. toktype  -  plus; 
break: 
case  -  : 

token. tok|t>|  - 
token  toktype  =  minus; 
break, 
case  "*  : 

If  U  '=  dt  { 
token.  lok|0)  = 

token  toktvpe  =  ntull; 

> 

break: 
case  <  : 

d  =  getetstdin': 

If  td'-=  =  I  { 

strepyt token. tok. 

token. toktype  =  relational_op: 

} 

else  If  id  ==  >  >  { 

strvpvt token. tok.  "<>”): 
token. toktvpe  =  relational  op: 

) 

else  { 

stn.pvttokcn.iok.  "•  "  t. 
token. toktvpe  -  relational  op. 

unitelct  tl.stdin ): 

> 

break 


case 

(1  —  gckistdmi 
If  (d  --  .  { 

sir  pvt  token  tok  '  ': 

token  lokU|v  _  rcl.tlion.il  op 

) 

else  { 

i  -  rtelv’ ’  tilin  v 
If  re"  ■  ( 

SIR  pvi  token. Ink  !. 

t"kci;  tol.tvp  tel.ition.il 

I 

else  ( 

stn  p\  i  ti d  -n  t'4  .  •  >. 


op 


61 


token,  toktype  =  _relational_op; 
ungetc(e,stdin 1: 
ungetctd.stdin); 

> 

> 

break: 

} 

) 

I*  identjokentt )  compares  the  token  to  the  list  of  pascal  reserved  words 

*  reluming  the  reserved  word  identification  if  found 

*  i 

ident  token(t) 

char  *t; 

< 

int  i. 

f. 

found. 

!  =  (I. 

found  =  FALSE; 

while  ( ! found  &&  stremptreservefi]  word.  "any_other")>  { 

If  (Istrvnip  (reservc|i).word.  t)t 
found  =  TREE: 

else 

i++: 

) 

return  (reserve|i  |. value  i: 

> 

/*  token  sell)  sets  all  of  the  elements  of  the  array  token  tok  which  is  part  of  *  the 

*  ; 

tokcn_sct(  I  { 

Int  i; 

for  ti  =  0:  i  <  IS:  i  »■+>  { 
token. tok|il  =  Ml  : 
token  tok tvpe  =  coto; 

> 

return: 

) 


lericv  lextt  { 

Int  v,  <1; 

static  int  lefthrateflag  =  0; 

If  (usc_tcmp  tok )  { 

use  temp  tok  -  0. 
returnuemptok  i; 

> 

else  ( 

t  -  getetstdim: 
token  set'  I: 


) 


while  t'c  -- 

if  u  --  n  )  ( 

‘  p until  hue  m  tt 
line  *  * , 

) 


)  |l  K’  “  *'41  )  V  (C  ==  \i  I  I  { 

r,t  i f  line*.  * 


ident  token 


structure  of  a  token  to  0 

token  set 
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contl:  If  (c  =  '(')  { 
d  =  getct  stdin): 

If  (d  =  '+')  { 

c  =  gelc(stdin); 

while  (c  !=  '*  )  { 

If  (c  =  An)  { 
line++; 

} 

c  =  getc(  stdin); 

) 

c  =  getc(stdin):  /*  riglitpjr  */ 
c  =  getct  stdin); 

If  ((c  ==  )  II  (c  =  An')  II  (c  =  'St')) 

goto  cont; 

} 

else  { 

token. tokfO]  =  '( '; 
loken.toktype  =  leftpar; 

ungetcld.  stdin): 
return)  token); 

} 

I 

Ifu  ==  {  ')  { 

while  (c  !=  '}')  { 

c  =  getct  stdin); 

If  (c  ==  Alt'!  < 

'*  print/  "line  is  #  9nf>n".  hncl.  *, 
linc»+ . 

) 

) 

c  =  getct  stdin); 

if  (U  ==  ’)  I1  tc  ==  An  )  II  (c  ==  At')) 

polo  cont 
if  (c  ==  t  ) 

goto  cont  1. 


If  (isdtgiltc'i  { 

printfi"lc\Jif>itt<%  d  9n').n“,c.ci;  * 
lexdicitt c  ); 

} 

else  { 

*  print/  "l  r'  ti  rrri\n".i  .1).  * 

If  ( i  sal  pin  ■  c )i  ( 

+  print/  lr.\illplhlt  ‘f ,/  '~<i \.n".i  .<  ).  * 
Icxnlphnt  c ); 

) 

else  { 

4  pnntf  "lexsnilt  hi,rt  ,1  <  )  n'\t  .i  /  * 

lexswiti  Inc  i. 

) 

I 

pi  inn ,  i,  d  r  /;  -  A  , !  '  <  n  j,  k  cn  t<  >1 1\  p,  d  t n  i-  -»  .  * 

return1  token  ■ 


l /:!  1  til-  I ’./<  *’  'I'l:  -  lh>  pir^iilm  la  h  lllllt  /for;  ;  (limits! 
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I*  to  correspond  to  the  BNF  name  of  the  Pascal  Construct  which  it  handles  *  li 


/*  after  called  must  get  next  token  *  / 

program_heading()  {  pPOgrCltTl_h€(lding 

if  (Itokflag) 

token  =  lex(); 
if  (tokflag) 

tokflag  =  FALSE; 
if  ( token. toktype  =  _ident)  { 
sc++; 

strcpy(stack[sc].tok,  token. tok); 
token  =  lex(); 

if  (token. '.oktype  ==  _leftpar)  { 
toker  =  lex( ); 

If  (token.tokrype  ==  _ident)  { 
token  =  lex(); 

while  (token.toktype  =  comma)  { 
token  =  lex(): 

if  (token.toktype  =  _ident) 
token  =  lex(): 

> 

if  (token.toktype  =  _rigbtpar)  { 
token  =  lex(): 

if  (token.toktype  =  semtcol) 

return: 
else  return: 

> 

else  return: 

} 

else  return: 

} 

else  return. 

) 

else  return. 

} 


■'*  after  called  must  get  ncM  token  * 

!al>el  dee  pattl  i  {  label _dec _part 

If  (Itokflag) 

token  =  lext ); 

If  (tokflag) 

tokflag  =  FALSE 
If  (token.toktype  ==  _in(  i  { 
token  =  lext  ); 

while  (token.toktype  !=  seniicolt  ( 
token  =  lext); 
if  (token.toktype  =  inti 
token  =lcx( ); 

else  return: 

> 

return. 

> 

else  return; 


constant)  i  {  COIlStOUt 

If  ((token  toktvpe  —  inf  H  lloken.loklvpe  -  realf 

return 

else  If  ((token  loktv|K-  --  pins)  II  ( token. toktype  ==  minus  it  { 
token  -  lev  i 

If  n token  tokn pe  -  ;  inf  H  (token  toktvjy  -  =  teal" 

return 

) 
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else  If  (token. toktype  =  _ident) 

retorn: 

else  If  ((token.toktype  ==  _plus)  (I  (token.toktype  =  _niinus»  ( 
token  =  lex( ); 

If  (token.toktype  =  ident) 

retnrn: 


> 

else  If  (token.toktype  =  string) 

return; 


else 


return: 


) 

/*  after  called  must  gel  next  token  *  / 
const  deft. )  { 

If  (token.toktype  ==  ident) 
token  =  lex(); 

else  return: 

If  (token.toktype  --  relational  op) 
token  =  le.\(): 

else  return; 


constant(  i; 

> 


*  sets  tokflag  before  terminates  * 
const  drf  pant)  { 
lf( ! tt>kf!ag). 

token  =  lext ): 

Ifttokflag  t 

tokllag  =  FALSE: 
const_dcfl  i: 
token  =  lext ): 

while  (token.toktype  ==  semicolt  { 
token  =  lex' ): 

ifdokcn.tokty  pc  ==  _ident)  { 
consl_def(  i: 
token  =  lext ): 

) 

else  { 

tokflag  =  TRUE: 

return, 

) 

} 

) 


type  definition  parti  )  {  tXJIC 

lf(  link  flag ' 

token  -  lext  i; 

Ifttokflag  i 

tokflag  -  FAI  SF 
typedelinition'  1 
If  ('tokflag  1 1 

token  -  lext). 

tok  Ilag4  =  FALSE: 

*  t\[’c  definition  >t\/u ■-  >sniu  luted  r\/>i  -  * 

■*  nnp.i.kcil  stiti' hi"  t\t’ i-  •!'  sc:  t\fn  -.■*  * 

*  swirli  t \/'i  t\/n  eels  nat  token  * 

while  (token. tnktvpe  ■  scinicol'  { 
token  -  lex1  •: 

If  (token. tnktvpe  ---  ident  i  { 
lx  pe  definition!  ). 


const  clef 


const jdcfjxirt 


definition  _part 
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If  (!tokflag4i 
token  =lex(); 

If  (tokflag4) 

tokflag4  =  FALSE; 

) 

else  { 

tokflag  =  TRUE, 

return; 

) 

) 

} 

type  definition*)  {  type  definition 

If  ( token. toktvpe  ==  _ident)  { 
token  =  lex(); 

If  (token. toktype  =  _relational_op)  { 
token  =  lex! ); 
type! ); 

} 

return: 

} 

return. 

> 

scalar  typc! )  {  SCQ.IQ1'  t\pC 

If  (token. toktype  ==  _idenl)  { 
token  =  lcx(l; 

If  (token. toktype  ==  _conima)  { 

while  (token. toktype  !-  nghtpar)  ( 
token  =  lex! ): 

> 

token  -  lex! ); 

return: 

> 

return. 

> 

return. 

> 

subrange  ! vpe!  i  {  Slihl'ClflgC  t\pC 

constant!  t: 

token  =  lex'  i. 

If  (token. toktvpe  ==  dotdoti  { 
token  -  lex!  I; 
constant!  i. 

) 

) 


simple  tvpe!  I  {  Simple  t\f)C 

If  (token  toktvpe  =-  Icftpnn  ( 
token  -  lex' 
scalar  t \p-  ' 

tokllael  -  TRUE; 

> 

else  if  (token  toktvpe  —  -  tilent  <  ( 
token  -  lex' 

If  (token  toktvpe  -- 
token  =  lex1  '. 
c  onstanl'  >. 
token  -]ex<  > 

tokllayl  --  TRIT 


lloldot  I  { 


else 

tokllag-)  =  TRUE: 

) 

else  If  ((token.toktype  ==  _ iut )  II  (token. toktype  =  _real)  II  (token.toktype  =_minus)  II  (token. toktype  ==  _plus)  II  (token. toklype 

subrange_type( ): 
token  =  lex( ); 
tokflag4  =  TRUE: 

> 

> 

unpackcd_structured_type( )  {  litipuckcd  StVUCtUVcd  t\pc 

If  (token.toktype  =  _arrav) 
array_type( ): 

else  If  (token.toktype  =  _record) 
recordtypel ); 

else  If  (token.toktype  =  _vt i 
set  typet ); 

else  If  (token  toktype  =  file) 
file  type! ): 

else  return; 


typed  <  type 

*  simple  type  i  links  to  sec  if  the  token  toklype  is  Jleftpjr  first  *1 
If  ((token.toktype  !=  _pointcr)  AA  (token.toktype  !=  _packed)  AA  (token.toktype  !=  arrayt  AA  (token.toktype  !=  record'  AA  (t 
simple  type!  ); 

If  ((token.toktype  ==  jpacked)  II  ( token. toklype  =  array)  II  (token.toktype  ==  recordi  II  (token.toktype  =  _sct  i  II  (token.toktype 
structured  tvpel ). 

) 

If  (token.toktype  ==  _pointer> 
poinlet  type! ): 


array-  typed  (  UITUV  t\pc 

token  =  lev  ): 

If  (token  toktype  ==  _leftbrace)  { 
token  =Iext  t: 

while  {token.toktype  '=  rightbracc) 
token  =  le.xl  r. 
token  =  lex( ): 

If  (token  toktype  ==  _ofi  { 
token  =  lex(  i; 
type' ); 

) 

I 

) 


structured  typet  1  { 

unpacked  structured  typed. 

If  (token.toktype  =-  packed)  ( 
token  =  lex! ): 
unpnc  ked  structured  tvpel  > 

) 

} 

U".  otd  typet  i  (  J'iX'tH'd 

If  I'tokflar ' 

token  -  lev 
lokfiar  FALSI 


structured _type 


67 


field_list( ); 

If  (token. toktype  =--  end)  { 
token  =  lex(); 
tokflagd  =  TRUE; 

) 

} 

field_list()  {  field JiSt 

If  (token. toktvpe  ==  ease)  { 
variant_part(): 

return: 

} 

else  If  (token. toktype  =  ident)  ( 
fixed_part( ); 

If  (token. toktype  —  _case) 
variant  parti ): 

else  return:  !*  token  toktype  ==  _cnJ  *! 

) 

else  return: 

> 

fixcd_p:irto  {  fixed _part 

record  section' l; 

If  (’lokflagd)  { 
token  =  lexi ): 

*  ret  oiil  section— >typc- >struclured  rvpc—>  * 

*  iwfuitkctt  slniettoe  typc—>tf  set  type  — ■>  * 

*  simple  type  l simple  type  t>eis  next  token)  * 

} 

tnkflagd  =  FALSE: 
while  l  token. toktype  ==  seinicol)  { 
token  =  1  c \(  i; 

If  ((token. loktvpe  end)  &<&  (token  toktype  (=  case'){ 
record  section1  i; 

If  (Itokflagt) 

token  =  |ex<  i: 
tokflag4  =  FALSE: 

I 

else  return; 

> 

) 


resold  section' >  <  record jsection 

If  ( token. toktype  --  _ulcnt?  { 
token  -  Icxi  ). 

while  I  token. loktvpe  =-  comma  t  { 
token  -  lc\' 

If  (token. Ink  type  =  iilcnl  > 
token  =  lev  *. 

> 

If  (token  t«>ktype  — colon'  { 
token  ~  lex' 
type 

I 

i 

> 


'.Want  I'.nt'  :  { 


variant  jmrt 
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token  =  lex(): 

If  (token. teletype  ==  ident)  { 
token  =  lex(); 

I*  rig  field  *  i 

If  (token. toktype  ==  _colon)  { 
token  =  text ): 

If  (token. toktype  ==  _ident)  { 
token  =  lex( ): 

if  (token. toktype  =  of)  { 
token  =  lex(); 
variant(); 
token  =  lex(i: 

while  (token. toktype  =  semicol)  f 
token  =  lex(): 

If  (token.toktype  =  _end)  { 

return; 

) 

variant! ); 
token  =  lex(): 

} 

tokilag  =  TRIT;; 


else  If  (token.toktype  ==  _of)  { 
token  =  leK<i. 
varianli  l; 
token  =  lext ); 

while  (token. toktype  ==  semieolt  { 
token  =  lex( ); 

If  (token.toktype  ==  end)  { 

return. 

I 

else  { 

variant! ); 
token  =  lcx(  i; 

> 

> 

t  ok  flap  =■  TRUE: 

return. 


ViuianK  >  { 


variant 


case  label  Irili  i; 

If  l token  toktype  ==  eolonl  { 
token  =  lev!  i: 

if  (token  toktype  =-  _lcftp;ut  ( 
token  -  lexi  i. 
field  lieu  ). 

If  (token  t<>ktv|x-  ==  riplttpat  l 

return. 


■'!  IV|»"  •  ! 

If  l  'tokfl.-rpi 

token  --  lev 


scrjyi'c 
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If  (tokflag) 

tokflag  =  FALSE; 

If  (token.toktype  =  of)  { 
lok.cn  =  lex) ); 
simple_tvpe( ); 

} 

} 


fiie_typc( )  {  file  type 

If  (Itokflag) 

token  =  lex(): 

If  (tokflag) 

tokflag  =  FALSE; 

If  (token.toktype  =  _of)  { 
token  =  lex(); 
type(); 

} 

> 


pointer  type) )  (  poitUCl' _t\'pe 

If  (token.toktype  ==  _pomter) 
token  =  lex< ): 

If  (token.toktype  —  idem)  { 
token  =  lex) ); 
tokilagd  =  TRUE; 

> 

> 


vanabie_deeiaraiion_part()  {  variable _declarati on _part 

token  =  lex) ); 
variable  declaration)  I; 
whlleitoken.toktypc  ==  _semicol)  { 
token  =  lex) ); 

If  (token,  t  ok  type  —  idcnt) 
variable  declaration)); 
else  {  i*  must  return  to  main  * 

tokflag  =  TRUE; 

return: 

} 

) 

» anablc_Ji\  la>alion_typc->slruc!uicJ_type->unpackcJ  structure  r\pe->  * 
i*  tl  set  tvpe->simple  type  (simple  type  t;ets  next  token t  */ 

> 


variable  declaration)  i  {  Variable _declata tlOH 

If  (token.toktype  —  idem)  ( 
token  =  le  x) ); 

while  t token.toktype  ==  comma’  { 
token  =  lex’  ). 

If  (token.toktype  ==  idem  i 
token  -  lev  i. 

> 

If  i token  loktvpe  ==  colon’  { 
token  -lex’  ’. 
type)  i. 

} 

tokn.ml  r  TRUE: 

) 

) 
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pr«:edurt  headingu  {  procedure  heading 

if  (lli'kflag) 

token  =  lex(); 

If  (tokflag) 

tokflag  =  FALSE: 

If  (token.toktype  =  _ident)  { 
sc++: 

strcpy  (stack)  scj.tok.token.tok): 
token  =  lex(): 

if  (token.toktype  =  _semicol) 

return. 

/*  return  to  calling  function  *  / 
else  if  (token.toktype  =  leftparl  { 
formal  parameter_scction(): 
while  (token.toktype  ==  semicol)  { 
fomial_panuneter_section(j; 

) 

if  (token.toktype  =  rightpar) 
token  =  lex(i; 

if  (token.toktype  ==  semicol) 

return: 

) 

} 

> 


*  formal _parametet  section  gets  next  token  *  . 

fonna!_paramctc!_  section! )  {  formal _pQ1'am€tCr  SCCtlOII 

token  =  lex'  I: 

if  (token. toktype  ==  _idcnt) 
paramctet_group(  ), 
gets  neu  token  * 
else  If  (token  toktype  ==  vat)  { 
token  -  lext ): 

,  trameter  groupt ): 

*  gets  nest  token  * 

I 

else  If  (token. toktype  --  function)  { 
token  =  lext ): 
parainetet  group'  i; 

*  gets  nest  token  * 

> 

else  If  (token  toktype  ==  procedure'  { 
token  =  lext ): 

if  (token  toktype  ==  ident i  { 
token  =  lext  ); 

while  (token.toktype  ==  comma)  { 
token  —  lext ). 

if  (token.toktype  ==  ident) 
token  =  lext  ); 

) 

*  when  lintslieJ  gets  nest  token  -  rtghtpai  o'  semicol  * 

) 

) 

) 


*  gcr.  nec'  token  citin'  a  \ctw<  o."  ot  ttghtj'0>  * 

paiamelvi  gt*<up'  '  {  ptl I'lllllC ! C I'  gl't  }ll[1 


If  'token  t"kt vpe  .-  ident '  ( 
token  -  lev  i 
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while  (token.toktype  ==  _comma)  { 
token  =  lex(): 

If  (token.toktvpe  =  _ident) 
token  =  lex(): 

} 

If  (token.toktype  =  _colon)  { 
token  =  lex(): 

If  (token.toktype  =  _ident) 
token  =  lex(): 

> 

} 

return: 

} 

function  headingO  {  futlCtiOH  heading 

If  (!  tokflag) 

token  =  lex(); 
if  (tokflag) 

tokflag  =  FALSE: 
if  (token.toktype  ==  ident)  { 
sc++: 

stn:py(stack[sc  |.tok,  token,  tok ), 
token  =  iex(): 

if  (token.toktype  ==  colon)  { 
token  =  lex( ); 

If  (token.toktype  ==  ident)  ( 
token  =  lex( !; 

If  (token.toktype  ==  semicol)  { 
token  •=  lex( ); 
tokflag  =  TRUE; 

return: 

) 

> 

return: 

> 

else  { 

If  (token. toklypc  =  leftpari  { 
fomial_paramete!_seclion( ): 

while  (token.toktype  ==  semicol)  { 
token  =  lexi  ); 
formal  jparametei_sectii>n( ): 

} 

If  (token.toktype  ==  _riglitpar)  { 
token  =  lcx(): 

If  (token. toklypc  ==  colon)  { 
token  =  lex( ): 

If  (tc'kcn.toktype  ==  ident)  ( 
token  =  lexi ): 

if  (token.toktype  =  _semicol)  { 
token  =  lex  ( ): 
tokflag  =  TRUE: 

return: 

} 

) 

I 

} 

} 

) 

> 

I 
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/*  <unlabelled  statement>  or  <labe!)  <unlabelled  statement>  */ 

statement!  i  {  Statement 

tokflag  =  FALSE; 

tokflag  1  =tokflag2=tokflag?=tokflag4  =  FALSE; 

if  (token.toktype  =  _int)  { 
token  =  lext ); 

If  ( token. toktvpe  =  colon)  { 
token  =  lext ); 
unlabelled_statement( ): 

) 

) 

unlabelledstatementt,  I; 

} 


I*  <  simple  ski  If  men  ’  >  or  <struetio  cd  shitcmcnl>  *  1 
unlabelled  statement!  i  { 

if  ((token  toktvpe  ==  begin)  II  (token. toktvpe  == 
stiucluied  statement!  ); 

else 

simple  statement!  I; 

) 


unlabelled _statemcnt 

if)  H  (token. toktvpe  ==  case)  II  (token. toktvpe  =  while)  II  (token. toktype  == 


'*  simple  skitemrnl  is  <jssif>nment  skttement>  01  <pi oeedure  statements  01  * 

*  goto  statement  01  empty  statement  * 

simple  statement!  i  {  simple  -Statement 

ifttoken. toktvpe  ==  ident)  ( 
tcmptok2  =  token; 
token  =  Ic x( ). 
temptok '  =  token: 
tokflag l  -  TRUE. 

if  ((token. toktype  !=  assign i  &&  (token. toktype  !=  leftbracei  &&  (token. toktype  !=  _period)  &&  (token.toktype  !=  _pointer) 
procedurc  stalcmcnt!  r. 

else 

If  (token.toktype  !"  semicoli 
assignment  statement!  ): 

else 

tokflag  -  TRUE. 

) 

else  If  (token.toktype  ==  _goto) 
go  to  statement!  i. 

else 

lokflag  --  TRUE. 

) 


*  whin  \iittjhlni  is  finished  c.u*ntin  t.  I<>k1hi£  is  set  bn  ansi  uinabh  K<'t's  *>  *  one  h'kcn  bc\onJ  * 

v  at  tabic 1  (  variable 

if  ( lokflag  I  i  ( 

If  t temptok 2  toktvpe  =-  ident'  ( 

if  i 'temptok?  toktvpe  ==■  leftbrasci  N  t temptok?. toktype  ==  period!  II  (tempt ok?. toktype  =-  pointctl'  { 
If  iteniplok?  toktype  -  pointer  ( 

temptok  t toktvpe  goto 

token  -  lev  . 
tokflag  -  TRI'fc: 

I 

else  If  HemptoV  '  toktype  leltbiase  j 

lcmpt“k  ?  toklx  pe  goto 

token  "  lev  ■. 


IS 


expression! ); 

while  (token. loklype  =  commc)  { 
token  =  lex( ); 
expression! ); 

} 

If  (token. toktype  =  rightbrace;  { 
token  =  lex!  I; 
tokflag  =  TRUE. 

) 

) 

else  If  (temptnkVtoktype  ==  _per\odt  { 
leniplok.Vloktypc  =  goto; 
token  =  lex! ): 

If  (token. toktype  =  identl 
token  =  lext). 
tokflag  =  TRUE 

) 

} 

-  *<-/.«• 

loklUm*  =  HOT.  *' 

while  ( (token. toktype  =  leltbracei  II  (token. toktype  ==  jperiod)  II  (token. toktvpe  ==  _pointet)i  { 
if  (token. toktype  —  -  poinlet )  ( 
token  =  lex<  I  . 
tok'lac  -  TRUE: 

> 

else  If  (token. toktype  =-  lettbraec  i  { 
token  -  lex(  t; 
expression!  K 

while  (token  toktype  --  eonunat  { 
token  -  lex 1  i: 
expression' 

> 

If  (token  toktype  -  rightbiaeei  { 
token  =■  lex'  >: 
tokflag  -  TRUE. 

) 

) 

else  If  (token  toktype  =-  peiiod'  { 
token  -  lexii; 

if  ( token  loklype  ==  idem  t  { 
token  =  !e\(  i; 
tokflat;  -  TRUE 

I 

else  return  *i  r>  '<•»  i*  . 

} 


1 

ternptokJ  toktvpe  -  goto 
toklln:  I  -  EAI..NI: 

) 

*  I/O  /  •  no!  m<  * 

else  { 

If  1  t'-inpt"V  I  ml  tvp"  nl'nit  { 

if  1 ! '■  r a < 4A  t  4  a  \  p-  leltbnuet  I1-  ttemplnkN  toktvpe  a —  period-  !l  t temptok.5  tokts p"  -  pomtet 

if  i  lemptok  N  toktr  jv  point'-!'  { 

t"inpt'  4.  N  t"kt>  pi  I’oto 

token  |. 

toklln1  -  TRi'l 

) 

else  If  Iternp'oi/'  loitip"  |r|(b|a"'i  { 

tempt' 4.'  t"kt\pi  got" 

n  -k'-n  lev  1 
CSpl—.sinn'  • 
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> 


while  (token. toktype  =  _comnia)  { 
token  =  lex(l; 
expression!  ); 

> 

if  (token. toktype  =  rightbrace)  { 
token  =  lex(): 
tokflag  =  TRUE; 

} 

} 

else  if  (temptokS. toktype  =  _period)  { 
tcmptok5. toktype  =  _goto; 
token  =  lex! ); 

if  (token.tokfvpe  =  ident)  { 
token  =  lex! ); 
tokflag  =  TRUE; 

) 

) 

> 

tcmptokd.  toktype  =  goto; 
teniplok5.  toktype  =  goto; 

while  ((token. toktype  ==  lefthraeet  II  (token  toktype  =  _period)  II  (token. toktype  —  jpoinlert1  { 
if  (token. toktype  =  pointer )  { 
token  =  lexu. 
tokflag  =  TRUE. 

) 

else  if  (token. toktype  ==  _leftbracc>  { 
token  =  lexi  t; 
expression!  t; 

while  (token. toktvpe  ==  _contnta!  { 
token  =  lex',  i: 
expression! ). 

} 

If  (token. toktype  ==  rightbracci  { 
token  -  lexi  I. 
tokflag  =-  TRUE: 

) 

> 

else  if  (token. toktype  =-  period)  { 
token  -  lex!  i; 

If  (token  toktype  --  ident)  { 
token  -  lex-  !. 
tokflag  =  TR'T: 

I 

) 

) 


else  if  (token  toktvpe  --  ident  i  { 
token  -  lexi  '. 
tokflag  -  TRUE. 

while  ((token  toktvpe  =-  lettbr.nrj  II  (token  toktype  ==  _pcnod '  II  (token  toktype  ==  ^pointer'!  ( 
If  (token. toktvpe  =-  pomiei 1  ( 
token  -  |cv  . 
tokflar  -  TRI  P 

I 

else  If  (token. toktype  —  Jcflhiace  { 
token  lev  \ 
expression1  t. 

while  (token  lokf>  pe  comma'  { 
token  =  lex1  > 
lokllag  -  FALSE 

cxptessjoir  t. 

) 

If  (token  toktvpe  e-  nirlid'int e t 
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) 


token  =  lex(): 
tokflag  =  TRUE: 


else  if  ( token. toktype  =  _period)  { 
token  =  lex(): 

if  (token.toktype  ==  _ident)  { 
token  =  lex(l: 
tokflag  =  TRUE: 

) 

} 

} 

) 

} 

/*  crtd  variable  * 


assignment  statement!  )  { 
variable! ). 

If  (token  toktype  ==  assign  i  { 
token  =  lex!  i: 
expression!  I; 

) 


> 


uKlexed_variablc(  i  { 
expression!  ): 

while  (toxin. toktype  -=  comma i  { 
token  -  lexl  I; 
expression!  »• 

> 

If  ttoken.toktype  =-  rightbra  ei 

return. 


> 


'*  c\f>rcssi(>n  <>nt-  toktn  heyomi  bemust  ('{simple  t.xprcssion  * 

expression'  >  { 

simple  expressioni  »: 

If  (I token. toktype  ==  relational  opt  II  (token. toktvpe  =v  _ iri ) )  { 

token  =  lex'  i: 
simple  expression!  I: 

> 

tokflag  =■  TRUE: 

I 


relational  opi  i  ( 

If  K token  toktvpe  =-  in>  1  i token  toktvpe  =-  relational  op)) 

return 

> 


simple-  r  *pf  '*svint|i  .  { 

if  { 

if  ntcmptok4  tokt\ pv  -  minus)  M  (icniptok4. toktvpe  ==  plusn  { 
('•mr  ». 

while  n token  toktype  -  -  plir  ■  •'  (token  inkfype  ==  _minus)  I 


assignment _statement 


indexed  variable 


expression 


relational  _op 


simple —expression 

ttoken.toktype  ==  _oiii  { 
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If  (token. toktype  =  _or) 
count++; 
token  =  lex(): 
tertru): 

> 

) 

else  { 

terrnl ): 

while  ((token.toktvpe  =  _minus)  II  (token.tokrype  ==  _plus)  II  ( token. toktype  =  _or))  { 
If  (token. toktype  =  or) 
count++: 
token  =  lex(); 
temi(); 

} 

) 

tokflagZ  =  FALSE: 

} 

-*  not  lokfliij;?  * 

else  { 

if  ((token. loktype  ==  minus)  II  (token. loktype  ==  _plus)  II  (token.toktvpe  —  or))  { 

token  =  lexi ); 

term) ); 

while  ((token. toktype  =  nnnusi  II  (token.toktvpe  —  _plus)  II  (token.toktvpe  ==  _or))  { 
if  (token. loktype  ==  _oi ) 
count  *■  +  : 
token  =  lex( ); 
lenn( ). 

> 

} 

else  { 

term)  I; 

while  ((token.toktvpe  ==  niinusi  II  (token. toktype  ==  _plus)  II  (token.toktvpe  ==  or))  ( 
if  (token.toktvpe  ==  _ot : 
count+  +  ; 
token  =  lext ); 
tcrnK  ): 

} 

} 

} 

! 


‘  tcmi  Ini'  him  ><;//< ./  next  token  <ihctiJ\  nillnl  4 

tetnu  i  (  fCJ'lll 

l.n  ion  i. 
if  i  !tokllag> 

token  =  lexi  i: 
lokflag  =  FALSE: 

while  tl token. loktype  --  mult »  H  ( token,  toktyjx'  ==  divide)  II  (token. loktype  _n>od:  l;  (token.toktvpe  =-  and) 
token  =  lcx< 
facto)' ). 


*  tii'ifi  r  '  wmjlf't  '  >  r't  {  <cwf'i  cssi>n  ^  f  * 

4  -  fun-  rt,>r:  v'M '  '(  >  <sct o'  rt» »;  <;.;.?>»»  **  * 

til-  t«M'  *  { 

if  (token  toktype  --  not?  { 
token  -  |e\it; 

while  ( token  fokfvpe  -  n»»l » 
token  lev  ' 


facia 


I1  (token  tokt%| 


77 


If  (token.toktype  ==  _Ieftpar)  { 
token  =  lex(); 
expression) ); 

if  (token. toktvpe  ==  _rightpar) 
token  =  lex)); 

> 

If  (token.toktype  =  leftbrace) 
set( ); 

If  (token.toktype  =  _ident)  { 
func!ion_designator(): 
variable) ); 
tokflaa  =  TRUE; 

> 

unsigned  constant)); 

} 

else  { 

If  (token.toktype  ==  _leftpar)  { 
token  =  lexl ): 
expression! ); 

if  (token.toktype  ==  riglltpar ) 
token  =  lcx( ): 

> 

If  (token.toktype  =  _leftbracci 
set'  i; 

If  (token.toktype  ==  _ident)  { 
function  designator! ); 
if  (token.toktype  !=  scnricoli 
variable!  I; 
lokilag  =  TRUE; 

} 

unsigned  constant! ); 

} 

> 


unsigned _ constant! )  (  UHSiglU'd  COllStUHt 

if  (lokflag2)  (  ~ 

If  Ktemptok-l. toktvpe  ==  inti  II  (tcmptok4.toktvpc  ==  realn 

return; 

if  (IteniptokJ. toktvpe  ==  _stnngi  II  (tentptok4. toktvpe  ==  idenli  II  (temptokd. toktvpe  ==  rtiln 

return; 

> 

else  { 

If  ((token.toktype  =  -  int)  II  (token  toktvpe  ==  real))  { 
token  -  lex!  i; 
lokilag  -  TRUE; 

return; 

) 

If  ( i token  toktvpe  =-  string)  II  (token  toktvpe  — -  ident)  II  (token.toktype  ==  nil  o  { 
token  -  lex'  i; 
lokilag  TRUE. 

return. 

I 

) 

) 


function  designator1  i  (  futlCtU^U  ih'SI  i^lhl ft1/' 

If  i  token  .toktvpe  ==  ident  >  ( 
temptokr  =  token 
token  -  lex'  ]. 

Icinptok  1  -  token. 
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1 


If  (token. toktype  =  leftpar)  { 
token  =  lex(); 
actual  parameter ); 
while  (token. toktype  =  _comma)  { 
token  =  lex(): 
actual_parametei(); 

> 

If  (token. toktype  —  _rightpar) 
token  =  lex(); 

I*  to  get  scmicol  recognized  bv  compound _slatement  *  / 

> 

tokflagl  =  TRUE: 

> 

) 


setu  {  SCt 

If  ( token. toktvpe  —  leftbrace)  { 
token  =  lcx(): 
element_list( ); 

>*  token  =  lexti.  *' 

If  (token. toktvpe  ==  righthrace)  { 
token  =  lext  i: 

tok  flag  =  TRI-E: 

> 

else  return 

> 

) 


'*  tokfljg  set  to  tine  it  not  multielement  element  list  *■ 

clement  Jistt )  {  clcHU’lltJist 

If  (token  toktype  !-  _rightbrace)  ( 
dementi  l: 

If  lltokflagi 

token  =  lex'  i; 
tokflag  =  FALSE. 

while  (token. toktype  ==  comma'  { 
token  =  lext  I. 
dementi  >: 

If  lltokflagi 

token  =  lr x' 
tokflag  =  FALSE. 

) 

If  (token. toktype  !=  commai  ( 

return 

> 

> 

) 


*  t f  f.'irr:  ,/f;*  >  «-  t pnwn >»:  r  n->:  wr  i;  ,t.-  * 

*  ne if  ln*i'  lii'r  i  /jin  rut-ten!  * 

element  -  {  clctUCUt 

If  < .token  t<  4:1 '« pu  --  ilokl'M  •  { 
tf'kun  Irv  >. 

C  X|M  ess|t  Ml'  ■ , 

} 

tnkfbr  ■  TFI  fl" 
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procedure_ statement! )  { 

if  ( token. toktype  =  _leftpar)  { 
token  =  lex( ); 

actual_parameter(  r. 
lf(!tokflag> 

token  =  lexO: 
ifflokflag) 

tokflag  =  FALSE: 

while  (token.toktype  =  comma)  { 
token  =  Iex(): 
actuaj_parameter(): 
lf(!  tokflag) 

token  =  lex(): 
if(  tokflag) 

tokflag  =  FALSE: 

} 

if  (token.toktype  ==  riglitpar)  { 
token  =  lex( ): 
tokflag  =  TRUE: 


procedure  statement 


actual  jiarameteri  l  {  CICtUCll _para"\Ct€r 

+  iii  this  point  left  jtat  has  already  hern  called  * 

*  tempt  ok  4  -  ft 'ken. 
token  ~  le.u  *, 
tempt  id  5  ~  token, 

tf  (<  token  toktype  --  leftbiaccf  (token  toktype  -  -  1  (token  toktype  -  -  jiointei ))  < 

token  =  le  \{  i, 
vat  tablet ). 

> 

etw  : 

tokilay:  --  TKIT.  * 
expulsion!  i; 


*  it  pt  o(  ain't  tJenitju’  or  fun<  tion  identifier .  * 

*  token  toktype  =-  taenti  is  handieti  b\  \  at  table  * 


) 


to  statement'  '•  { 

If  i !lokf»ae» 

token  -  lev  i, 

If  Mokflaj:' 

l“k!l.n:  -  FALSF 
If  'token  tokt vpe 
return 


in!  • 


$o_t<  statement 


*  ^  f  ompoitn.j  \t,nenient  *  ot  «- .<  oruhnorui!  statement  *  or  <tepetiri\c  statement'* 

*  >'t  e  M  \  fh  -  1,1  ft  Off  *  * 

s I !  ik  Mirt-'l  st.iteni'  i *»  { 

If  Mokcn  |>»kT>pe 

c<«np'nm«l  ••latemcnt  1 

If  !  (token  f * 'k T \  it  i  i  (token  t«4  t  v  p-  t  ,iv'  1 1 

i  oil  lit  n  »n.i!  -talenient  1 
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structured  stutcrncni 


If  ((token.  I  ok  type  =  _while)  II  (token. toktype  =  _repeal)  II  (token. toktype  ==  for)) 
repetitive  statement)  >: 

else 

If  (token. toktype  ==  with) 
with  statement) ); 

J 


compound  statement) )  {  COmpOUnd _Stdtement 

token  =  lex) ): 
statement)  >: 

/*  statement  may  go  one  token  past  because  of  simple  expression  *  / 

If  (Itokflag) 

token  =  lex) ); 
tokflag  =  FALSE: 
while  (token. toktype  —  semicol)  { 
token  =  lexti: 

If  (token  toktype  =  end)  { 
token  =  lex) ); 
tokflag  =  TRUE: 

return 

> 

else  { 

statement)  l: 

If  (Itokflag)  ( 

token  =  lex) ); 
tokflag  =  FALSE: 

) 

> 

> 

If  (token. toktype  --  _endi  { 
token  =  le\<  t; 
tokflag  =  TRUE: 

return: 

I 

> 


conditional  statement*'  {  COJiditlOtIQl  StOtCUlCflf 

If  (token  toktype  --  if » 
if  statement*  ». 

else  If  Mokcn. toktype  --  case* 
l  nse  statement*  ». 

i 


it  st.iiemcniM  i  if  statement 

i"unl*v 

token  ~  lex'  1 
expression1  1 

If  '  token  lokl  vpe  '  then  • 
token  lex1  1. 

If  I token  toktype  -  then'  ( 
token  -  It-  X'  ■ 
st.ilcntcnt1  ■ 

If  'token  tnktvpe  -=  en-l  ( 

return 

) 

If  'token  tofts  pe  --  else!  { 

I"ken  lex  ■. 


SI 


statement' 


return: 

) 

else 

return: 

} 

printff'In  if,  count=%cNi",counI); 


case_statement()  {  CQSe  StQtCniCtlt 

If  (Itokilag)  — 

token  =  lex(); 

If  (tokflag) 

tokflag  =  FALSE; 
expression!): 

If  ( token. toktype  ==  _of)  { 
token  =  lext): 

If  ( (token. toktypc  —  _plus)  II  ( token. toktype  =  minus)  II  (token.toktypc  =  _int)  II  (token. toktype  ==  string) 
case  list  element! ): 
count+  +  : 

If  (Itokflagt 

token  =  lext ): 

If  (tokflag) 

tokflag  =  FALSE: 

while  (token. toktype  —  semtcol)  { 
token  =  lex(): 
case_list_element( ): 
count*  ■) : 

If  I  (tokflag  i 

token  =  lex(): 

If  I  tokflag  I 

tokflag  =  FALSE. 

) 

If  I  token. toktype  ==  emit  { 
token  =  lex(  i: 
tokflag  =  I'Kt'E: 

return 

1 

> 

else  { 

If  (token  toktype  =-  emit  { 
token  =  lex' i 
tokflag  =  TRl'L: 


(token. loktyi 


< 


case  lit  element'1  (  l  (ISC  IlSf  clctHCUt 

<.  ase  laKcl  lot  i 

If  t  tc  iken  |okt  \  | ”'  -  -  t  o|on  '  { 
token  -  lex'' 

If  't"ke  1  tiutvpe  ---  semtcol  > 

return 

else 

statement' 

I 

) 


e  .r  e  late  !  list  .  ( 


case  label  list 
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constant! ): 
token  =  lex(); 

while  (token.toktype  =  comma)  { 
token  =  lex(); 

constant! ); 
token  =  lex(); 

} 

return. 

} 


repetitive_statement( )  {  V€pCt\ti\'€  StUt€f7l€Ht 

If  (token.toktype  —  _while) 
while  statement!); 

If  (token.toktype  =  repeat) 
repeat  statement! ); 

If  (token  toktvpe  ==  for) 
for  statement! ); 
return; 

) 


while_ statement!)  (  W'llUc _Stateni€Ht 

count-*-*: 

If  (Itokflati 

token  =  le.\(  k 
If  (lokflagi 

Inkling  =  FALSE; 
expression'  i; 

If  (token. toktvpe  ==  do)  { 
token  =  lexi  I. 
statement!  l; 

1 

> 


repeat  statement!  >  (  I  CfH’Ut  Statement 

count 4  * . 
token  -  lexi  ). 
statement1  i; 

If  ( llokflagi 

token  -  lexi  i. 

Ink  flag  -  FALSE. 

while  (token.toktype  =  -  semicol)  { 
token  -  lexi  i, 
statement'  ' 

If  t'lnkllag. 

token  -  lexi  i. 

If  i  tokll.it  i 

tok II. u;  -s  FALSE 

) 

If  (token  loktype  --  until  i  { 
token  -*  lex' 
expression 

I 

return 

) 


!"t  statement 


for  statement 


8.1 


count+  +  : 
token  =  text); 

if  ( token. toktype  ==  _ident)  { 
token  =  lex(); 

if  (token. toktype  ==  assign)  { 
token  =  lex( ); 
for_list(); 

if  ( token. toktype  ==  _do)  { 
token  =  !ex(); 
statement!); 

return; 


} 

} 

} 

} 

forlisM  i  {  foi  list 

expression! ): 

if  ((token. toktype  —  to)  II  (token. toktype  ==  _do»nto>>  ( 
token  =  le.\( ); 
expression! ); 

) 

lokllng  =  TRUE; 

return: 

> 


with_statcment(  )  {  With _ StUttWlCHt 

if  (Itokflagi 

token  =  lext): 
if  (tokflag) 

tokflng  =  FALSE; 
record  variable  list! ); 

If  (token. toktype  ==  do)  { 
token  =  lext ); 
statement!  i: 

) 

) 


recoid  variable  lisii  i  (  I  CC  OV(l  VOVlClhlc  list 

variable!  i: 

if  (Itokflagi 

token  =  lex!  I; 
tokflag  =  FALSE: 

while  ( token. toktvpe  ~=  conimai  ( 

If  ( Itokflagi 

token  =  le\(  ): 
tokflag  =■  FALSE, 
variable!  i. 

> 

> 


maim  i  {  tlhilll 

Int  siailline. 

Iinccotinl. 

while' 'leofisldiipi  { 
lf(  llokllag  ' 

token  =  lext  i. 
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Ifttokflag) 

tokflag  =  FALSE; 
if  (token  toktype  =  _progrant) 
program_heading( ); 
else  If  (.token. toktype  =  label) 
label_dec_part( ); 

else  If  (token. toktype  =  type) 
type  definition _part( ); 
else  If  (token. toktype  ==  const) 
const  def  paru ); 

else  If  (token. toktype  =  _packed){ 
pnntff'Have  packeJW); 
structured_type, ); 

} 

else  If  (token. toktype  ==  _array){ 
pnntff'Have  array'll" ): 
arrav_type( ): 

) 

else  If  ( token. toktype  —  record l{ 
pnntff’Have  recordin’’): 
record_tvpe(  i; 

} 

else  If  (token  toktype  ==  _caset{ 
printff  "Have  case'vn’i: 
case  statement!  >: 

) 

else  If  (token. toktype  ==  set){ 
pnntff’Have  sefvn  "  ) 
set  tvpe(  I; 

} 

else  if  (token  toktype  ==  _filctf 
printff  "Have  file' -it”). 
file_tvpci  i: 

) 

else  If  ( token. toktype  ==  vat) 
variable_dcclaration_pat1(  i. 
else  If  ( token. toktype  ==  procedure  l 
procedurc_hcailingl  i: 
else  If  (token. toktype  ==  function) 
functionheadingi  ); 
else  If  ( token. toktvpe  ==  begini  { 
starlline  =  line; 
count  -  •(; 

compound  statement! ); 

If  (sc  >=  0)  ( 

linecounl  -  line  —  startline; 

printff  "*♦**■",  s:  (V  d  ci  d>n  ",  stack[sc].tok.count+ 1  .linecount  t 

sc  -=  I; 

) 

> 

else  If  (token  toktvpe  —  -  i f '{ 

piiltlfi  "Have  if  ji  i. 
il  statement!  >: 

) 

else  If  (token  toktvpe  •==  _wltilei{ 
printft "Have  whilcvn  I: 
while  statement'  !. 

) 

else  If  ( token .loktvpe  --  lepeal  i{ 
pi  nit  I  <  Have  repeat  n  i; 
repeat  statement!  ), 

) 

else  If  (tokcn.toklype  ==•  for  i{ 
pi  lilt  I  *  I  lave  lor  ji  i. 
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for_statenient(); 

} 

else  If  (token. toktype  ==  _with)  { 
printf("Have  withXn"); 
with_statement( ): 

> 

else  If  ( token. toktype  =  _pointer)  { 
printfl  Have  pointerVi"); 
pointer_tvpe(  i; 

} 

else  If  (token. toktype  =  _goto) 
go_to_stalement( ): 

) 

) 
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